Subject: Re: reboot problems unmounting root
To: Bill Stouder-Studenmund <wrstuden@netbsd.org>
From: Antti Kantee <pooka@cs.hut.fi>
List: tech-kern
Date: 07/05/2007 23:50:38
On Thu Jul 05 2007 at 13:14:54 -0700, Bill Stouder-Studenmund wrote:
> > Yes. But the only reason it explodes seems to be the lock pointer
> > exported from the vnode. I, however, do maintain that you should revoke
> > all vnodes down from the upmost you revoked. You just need to make the
> > one above know that it's a deadfs situation. Actually, that's what the
> > code does now. It just doesn't revoke any layerfs vnodes.
>
> Why? What's the functional difference (assuming we fix this issue) between
> a layer node accessing a dead vnode and a vnode that was a layer vnode
> that now is a dead vnode? You get errors for all operations anyway.
>
> I don't see one. And since we have to be able to handle a stack where
> there are multiple mounts above one, we have to deal w/ "unexpected"
> revoking of the underlying vnode. To do that means handling the case of a
> live vnode on top of a dead one, and "handling" means returning buckets of
> errors and not crashing the kernel.
Hmm, I thought had a very good reasoning for that, but I think I lost it.
Maybe I misreasoned.
Anyway, the CURRENT state is that ONLY the lower vnode is being revoked
because of layer_bypass(). The upper is kind of implicitly getting
revoked. Maybe that should be changed to revoke only the upper one.
> Huh? I was talking about it being in the free list. vnodes on the free
> list do NOT have VXLOCK set. :-)
>
> Also, we don't leave VXLOCK there forever. :-)
I was talking about VOP_RECLAIM. That's where we want to free the lock
if at all. But n/m that part ;)
> > I don't see how the forced unmount situation is any different from a
> > revoke and therefore why revoke would need a special flag.
>
> A forced unmount shouldn't be doing the same thing as a revoke. It should
> just revoke the at-level vnode. The difference being force-unmounting a
> layer shouldn't blast away the underlying vnodes.
Well, revoke is "check for device aliases + reclaim regardless of
use count". A forced unmount is "reclaim regardless of use count".
I was just talking from the perspective of reclaim, once again.
> Revoke is different from unmount _if_ the leaf fs does things differently
> depending on if someone's still using the vnode or not. If it doesn't
> differentiate, then no biggie and no parameter. If however it does
> differentiate, the reason we need a flag is to tell the leaf fs that its
> use count lies.
>
> The reason for the flag is to help the leaf fs. How exactly does the leaf
> fs know if there are users other than the revoker on the vnode? It looks
> at the use count and compares it to one. However if the VOP_REVOKE() came
> through a layered file system, that test isn't good. The one reference
> could well be the one held by a single upper layer, which gives you no
> indication of how may things hold the upper layer's vnode.
>
> That's why I thought it'd matter. If however whatever leaf file systems do
> doesn't care (works right either way), then we don't need said flag.
Ah, ic.
But now remind me why the revoke should be coming down at all?
> > The call sequence is approximately the following:
> >
> > VOP_REVOKE
> > VOP_UNLOCK (vclean() doesn't call inactive because the vnode is still active)
> > VOP_RECLAIM
> > v_op set to deadfs
> > upper layer decides this one is not worth holding on to
> > vrele()
> > VOP_INACTIVE (for deadfs)
> > new start for the vnode
>
> It's not clear here what's calling what.
VOP_REVOKE (generally) -> vgonel -> vclean -> (VOP_UNLOCK, VOP_RECLAIM,
sets v_op to deadfs)
another party: vrele() -> (VOP_INACTIVE(), put vnode on freelist)
> Also, unless we short-circuit it, there should be a reclaim when we
> recycle the vnode.
No, there won't (technically). The vnode now uses deadops and
dead_reclaim is nada.
> > Where do you want to free the lock structure? We will never get another
> > VOP_RECLAIM for the original file system method. Do we want to do it in
> > deadfs VOP_INACTIVE? That's ugly, but it's the only place I see where to
> > do it. It also needs an accompanying bit to say "hey, we already nuked
> > the lock in the file system reclaim op" in case the vnode is NOT revoked.
>
> The lock structure is part of struct vnode, so we never have to "free" it.
> We decomission (LK_DRAIN) it as part of reclaiming, but we can un-drain it
> I believe. If we can't, we should figure out a way to.
free, drain, conceptually the same
> Something for deadfs VOP_INACTIVE would be good, namely to do something to
> indicate "recycle me soon." Hmm, wait, that's a feature we haven't stollen
> from Darwin yet. :-)
But dead_inactive is called only for vnodes which were forcibly reclaimed.
Others have VOP_INACTIVE called already when they are still attached to
their real file system. I all inactive methods would need to be patched
to cope with this, and that add yet more difficult-to-comprehend side
effects to file system writing.
> > I'd prefer another approach where the whole layering lock export system
> > was redone instead of redoing reclaim to fix the layer lock exporting.
> > But I don't know how to fix layerfs locking ...
>
> Do we care about the inerlock? The advantage of the current locking
> behavior is that, since each layer has access to the lockmgr lock, we can
> pass in the right interlock. If we either don't care, or we're fine with a
> lockmgr call on vnode D releasing the interlock on vnode A (because
> someone tried to lock A with its interlock held, and A is above B, which
> is above C, which is above D), then we can just turn it all into recursive
> VOP_LOCK() calls on the underlying vnodes.
>
> Have you read Heidemann's disseration? The locking export is an attempt at
> the shared-state stack stuff he did.
No. I guess that's saying "I should" ;)
> We may just need to give up on the interlock, as there's not really any
> way we can guarantee it for something like unionfs.
I actually had a similar idea and did a two-layer hack (it doesn't
recurse to the bottom of the stack, but that's easy). It seems to fix
the problems. I can't see it being any worse than the current state of
the art.
Index: kern/vfs_subr.c
===================================================================
RCS file: /cvsroot/src/sys/kern/vfs_subr.c,v
retrieving revision 1.288
diff -u -p -r1.288 vfs_subr.c
--- kern/vfs_subr.c 5 Jun 2007 12:31:31 -0000 1.288
+++ kern/vfs_subr.c 5 Jul 2007 20:01:06 -0000
@@ -1649,6 +1649,7 @@ vclean(struct vnode *vp, int flags, stru
*/
vp->v_op = dead_vnodeop_p;
vp->v_tag = VT_NON;
+ vp->v_vnlock = NULL;
simple_lock(&vp->v_interlock);
VN_KNOTE(vp, NOTE_REVOKE); /* FreeBSD has this in vn_pollgone() */
vp->v_flag &= ~(VXLOCK|VLOCKSWORK);
Index: miscfs/genfs/layer_vnops.c
===================================================================
RCS file: /cvsroot/src/sys/miscfs/genfs/layer_vnops.c,v
retrieving revision 1.31
diff -u -p -r1.31 layer_vnops.c
--- miscfs/genfs/layer_vnops.c 16 Apr 2007 08:10:58 -0000 1.31
+++ miscfs/genfs/layer_vnops.c 5 Jul 2007 20:01:06 -0000
@@ -605,10 +605,11 @@ layer_lock(v)
int a_flags;
struct proc *a_p;
} */ *ap = v;
- struct vnode *vp = ap->a_vp, *lowervp;
+ struct vnode *vp = ap->a_vp;
+ struct vnode *lowervp = LAYERVPTOLOWERVP(vp);
int flags = ap->a_flags, error;
- if (vp->v_vnlock != NULL) {
+ if (lowervp->v_vnlock != NULL) {
/*
* The lower level has exported a struct lock to us. Use
* it so that all vnodes in the stack lock and unlock
@@ -635,7 +637,6 @@ layer_lock(v)
* first). But we can LK_DRAIN the upper lock as a step
* towards decomissioning it.
*/
- lowervp = LAYERVPTOLOWERVP(vp);
if (flags & LK_INTERLOCK) {
simple_unlock(&vp->v_interlock);
flags &= ~LK_INTERLOCK;
@@ -666,9 +667,10 @@ layer_unlock(v)
struct proc *a_p;
} */ *ap = v;
struct vnode *vp = ap->a_vp;
+ struct vnode *lowervp = LAYERVPTOLOWERVP(vp);
int flags = ap->a_flags;
- if (vp->v_vnlock != NULL) {
+ if (lowervp->v_vnlock != NULL) {
return (lockmgr(vp->v_vnlock, ap->a_flags | LK_RELEASE,
&vp->v_interlock));
} else {
@@ -676,7 +678,7 @@ layer_unlock(v)
simple_unlock(&vp->v_interlock);
flags &= ~LK_INTERLOCK;
}
- VOP_UNLOCK(LAYERVPTOLOWERVP(vp), flags);
+ VOP_UNLOCK(lowervp, flags);
return (lockmgr(&vp->v_lock, flags | LK_RELEASE,
&vp->v_interlock));
}
--
Antti Kantee <pooka@iki.fi> Of course he runs NetBSD
http://www.iki.fi/pooka/ http://www.NetBSD.org/
"la qualité la plus indispensable du cuisinier est l'exactitude"