Subject: Re: reboot problems unmounting root
To: Bill Stouder-Studenmund <wrstuden@netbsd.org>
From: Antti Kantee <pooka@cs.hut.fi>
List: tech-kern
Date: 07/06/2007 15:02:00
On Thu Jul 05 2007 at 17:27:54 -0700, Bill Stouder-Studenmund wrote:
> On Thu, Jul 05, 2007 at 11:50:38PM +0300, Antti Kantee wrote:
> > On Thu Jul 05 2007 at 13:14:54 -0700, Bill Stouder-Studenmund wrote:
> >
> > Hmm, I thought had a very good reasoning for that, but I think I lost it.
> > Maybe I misreasoned.
> >
> > Anyway, the CURRENT state is that ONLY the lower vnode is being revoked
> > because of layer_bypass(). The upper is kind of implicitly getting
> > revoked. Maybe that should be changed to revoke only the upper one.
>
> For sys_revoke() processing, we want to revoke the lower one. That's the
> only way we can destroy all instances of the device and all access paths.
Right. That was my reason.
Although, the description on the manual page (revoke(2)) is a bit wrong:
The revoke function invalidates all current open file descriptors in
the system for the file named by path.
It doesn't take aliasing into account.
> > > > I don't see how the forced unmount situation is any different from a
> > > > revoke and therefore why revoke would need a special flag.
> > >
> > > A forced unmount shouldn't be doing the same thing as a revoke. It should
> > > just revoke the at-level vnode. The difference being force-unmounting a
> > > layer shouldn't blast away the underlying vnodes.
> >
> > Well, revoke is "check for device aliases + reclaim regardless of
> > use count". A forced unmount is "reclaim regardless of use count".
> > I was just talking from the perspective of reclaim, once again.
>
> It's different w.r.t. layering. unmount wants to do the top of a stack,
> sys_revoke() wants to do the root.
No, it's not. Forcibly unmount the root layer. I am still talking
about *reclaim*, not revoke. I am talking about reclaim because the
reclaim introduced by revoke is the one causing problems. If you just
give revoke special treatment, the unmount -f problem remains.
> > But now remind me why the revoke should be coming down at all?
>
> Good question. Because we have to revoke ALL access to a given device. A
> layer stack can have fan-out (I think I use the word differently from
> Heidemann, I mean one leaf fs w/ multiple different layers on top). So
> there can be multiple nodes on top of one. Thus to get them all, we have
> to blast the bottom one.
>
> Also, if we revoke a device, we have to revoke anyone who accesses a
> vnode that accesses that driver, not just the vnode we started with. So
> anyone who opens the device in a chroot or opens the device from another
> file system, they all have to go away. That's why we do the aliasing
> stuff. If revoke didn't go all the way down, it wouldn't happen.
Right. Earlier I thought you said we should only nuke the top one and
I was confused. Good that we agree now.
> > > > The call sequence is approximately the following:
> > > >
> > > > VOP_REVOKE
> > > > VOP_UNLOCK (vclean() doesn't call inactive because the vnode is still active)
> > > > VOP_RECLAIM
> > > > v_op set to deadfs
> > > > upper layer decides this one is not worth holding on to
> > > > vrele()
> > > > VOP_INACTIVE (for deadfs)
> > > > new start for the vnode
> > >
> > > It's not clear here what's calling what.
> >
> > VOP_REVOKE (generally) -> vgonel -> vclean -> (VOP_UNLOCK, VOP_RECLAIM,
> > sets v_op to deadfs)
> >
> > another party: vrele() -> (VOP_INACTIVE(), put vnode on freelist)
>
> Ok. Note it could be the same party if no one else had the vnode open. ;-)
No, it can't. If usecount is 0, nobody will call vrele(). vclean() will
call VOP_INACTIVE directly. Hence you either get a call to fs_inactive()
or dead_inactive(), not both.
> The problem I see is that there's no easy way (that I can see) of making
> this extend to all the layering we can have now. What about a stack like:
>
> A B
> \ /
> C D
> \ /
> L
>
> Where A, B, C, and D are different layer mounts, and L is the leaf file
> system under it all.
>
> Say D processes the revoke, or say it happens directly on L. C and D can
> notice that something changed underneath, but A and B can't easily notice
> a change to L, since they'd only see it if C changed somehow.
My head just exploded. Call me silly, but I'd be happy if a stack like
this worked for starters:
A
|
B
Seriously though, that's what I was talking about when I said recursing
to the bottom. Instead of caching a lock pointer in each layer node,
traverse to the bottom or until a defunct lock is found and act as if
a lock wasn't exported.
> For now, let's just undrain the lock, then wait for everything above to
> get torn down.
If it can be made to work, let's.
--
Antti Kantee <pooka@iki.fi> Of course he runs NetBSD
http://www.iki.fi/pooka/ http://www.NetBSD.org/
"la qualité la plus indispensable du cuisinier est l'exactitude"