Subject: Re: pr/35143 and layer_node_find()
To: Bill Studenmund <wrstuden@netbsd.org>
From: Chuck Silvers <chuq@chuq.com>
List: tech-kern
Date: 12/01/2006 09:52:49
On Thu, Nov 30, 2006 at 10:10:22AM -0800, Bill Studenmund wrote:
> > I think we need to figure that out before we decide on a fix.
>
> Ok, here's an idea on how it can happen on our current kernel.
>
> We have one process holding the lock on the lower vnode, upper vnode is
> unreferenced.
>
> Then another process comes into the layered file system, does a lookup on
> vnode, and blocks in VOP_LOOKUP() on the lower layer waiting for the lock.
>
> Then a thrid process comes in and decides to recycle a vnode. It gets the
> layer vnode, sets VXLOCK, then goes to sleep waiting to get the stack's
> vnode lock.
wouldn't getcleanvnode() skip over this locked VLAYER vnode?
> First process finishes doing whatever, and releases the lock on the stack.
> Both the second and third processes are marked runnable.
>
> Second process gets the lock and proceeds to get the vnode above the lower
> node, the same vnode the third process wants to recycle. vget() blocks as
> it sees the VXLOCK flag set.
>
> We are now deadlocked.
>
> We have to have the vget() not wait if it sees VXLOCK.
>
> I still don't see what's wrong with letting the being-destroyed nodes stay
> in the hash table. For them to have VXLOCK set, there has to be a thread
> reclaiming them, so they will be removed from the hash list in due time.
I don't know that it would cause any particular problem right now,
it just doesn't seem like a good idea to allow multiple vnodes with
the same identity to exist at the same time, even if all but one of
them are in the process of being reclaimed.
-Chuck
> The only other alternative I can see is to have vget() detect VXLOCK,
> unlock the lower node, relock the lower node, and try again. That would
> give the reclaim time to remove the dying upper node.
>
> Take care,
>
> Bill