> On 29. Jan 2022, at 22:08, Taylor R Campbell <riastradh%NetBSD.org@localhost> wrote: > > New draft changes to resolve open/close/attach/detach races. > > There's one snag in here that I haven't been able to resolve -- a race > in concurrent revoke and detach; maybe hannken@ can help? The snag is > that vdevgone needs to revoke all existing device nodes _and_ wait for > the revocation to complete -- even if it is happening concurrently via > the revoke(2) system call. > > But spec_lookup_by_dev will just skip vnodes currently being revoked, > and possibly return before they have finished being revoked. I tried > making spec_lookup_by_dev wait with vdead_check(vp, 0) instead of > vdead_check(vp, VDEAD_NOWAIT) in some circumstances, but that didn't > work -- it led to deadlock or livelock (and then kernel lock spinout), > depending on how I did it. > > My attempts to make this work may have failed because vdead_check is > forbidden if the caller doesn't hold a vnode reference, and it's not > clear taking a vnode reference is allowed at this point. Generally, > taking a new reference to a vnode being reclaimed should not be > allowed. Do you have a recipe so I could try it here? > (I don't entirely understand ad@'s recent(ish) changes to allow it in > some cases, which strikes me as a regression from the system we had > before where VOP_INACTIVE's decision is final -- a huge improvement > over the piles of bug-ridden incoherent gobbledegook we used to have > to deal with vnodes being revived in the middle of reclamation.) While ad@ introduced a race to vrelel() where a vnode cycle may miss the final VOP_INACTIVE() I don't see a regression here. The vnode lock already serializes VOP_INACTIVE() and all that is missing is a path to retry the vrelel() if the vnode gained another reference. <snip> -- J. Hannken-Illjes - hannken%mailbox.org@localhost
Attachment:
signature.asc
Description: Message signed with OpenPGP