Subject: Re: Now: Fs suspension take 2
To: Bill Studenmund <wrstuden@netbsd.org>
From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
List: tech-kern
Date: 08/19/2006 17:31:58
On Fri, Aug 18, 2006 at 03:02:13PM -0700, Bill Studenmund wrote:
> On Thu, Aug 03, 2006 at 04:11:29PM +0200, Juergen Hannken-Illjes wrote:
> > On Wed, Jul 05, 2006 at 10:30:32AM -0700, Bill Studenmund wrote:
> > > On Tue, Jul 04, 2006 at 04:23:55PM +0200, Juergen Hannken-Illjes wrote:
> > > > On Mon, Jul 03, 2006 at 03:51:35PM -0700, Jason Thorpe wrote:
> >
> > [snip]
> >
> > > > 2) Put all gates inside the file systems as
> > >
> > > My preference.
> >
> > Ok, the attached diff puts transaction locks so we can suspend ffs file systems.
>
> Thank you. This is a lot of work!
>
> > - The fstrans implementation goes to files sys/fstrans.h and kern/vfs_trans.c.
> >
> > - Add new VFS operation VFS_SUSPENDCTL(MP, C) to request a file system to
> > suspend/resume.
> >
> > - Transaction locks are used if mnt_iflag has IMNT_HAS_TRANS. Otherwise the
> > current (vn_start_write) are used. With this diff IMNT_HAS_TRANS gets
> > set for ffs type file systems if kernel option NEWVNGATE is set.
>
> How many file systems really supported the old suspension method other
> than ffs? Or is this coexistance more of a smoothing of switching to the
> new method for ffs?
>
> i.e. do we need this?
All file systems on a block device support file system external snapshots.
We add some transaction locks into genfs_XXX and ufs_XXX operations. They
should only lock for file systems that need it.
> > - The ufs_lock/ufs_unlock part may look strange but it is ok to ignore
> > vnode lock requests for the thread currently suspending.
>
> I think it's probably fine to ignore lock requests while the system is
> suspending. However we need to make sure that locking and unlocking stay
> on the same side of the is-suspending state.
>
> i.e. any locks we grab while suspending MUST be released before we
> finish suspending. And any locks grabbed before suspending MUST be
> released after we finish. Otherwise we can leak locks.
>
> It could be that all we have to do is document this well.
Yes. All we do now is to run ffs_sync(). And it doesn't leak locks.
> > - I left the throttling part of my first attempt because it is only needed
> > for multiple softdep enabled file systems on the same disk. This problem
> > should be solved by a congestion control inside softdep.
>
>
> > @@ -762,17 +817,21 @@ ufs_whiteout(void *v)
> > int error;
> > struct ufsmount *ump = VFSTOUFS(dvp->v_mount);
> >
> > error = 0;
> > +
> > switch (ap->a_flags) {
> > case LOOKUP:
> > /* 4.4 format directories support whiteout operations */
> > + fstrans_done(dvp->v_mount);
>
> Why do we call fstrans_done() here? CREATE and DELETE _start_
> transactions, so I don't see where the transaction we're _done'ing starts.
It doesn't start ... Will be removed.
> > if (ump->um_maxsymlinklen > 0)
> > return (0);
> > return (EOPNOTSUPP);
> >
> > case CREATE:
> > /* create a new directory whiteout */
> > + if ((error = fstrans_start(dvp->v_mount, FSTRANS_SHARED)) != 0)
> > + return error;
> > #ifdef DIAGNOSTIC
> > if ((cnp->cn_flags & SAVENAME) == 0)
> > panic("ufs_whiteout: missing name");
> > if (ump->um_maxsymlinklen <= 0)
> > @@ -791,8 +850,10 @@ ufs_whiteout(void *v)
> > break;
> >
> > case DELETE:
> > /* remove an existing directory whiteout */
> > + if ((error = fstrans_start(dvp->v_mount, FSTRANS_SHARED)) != 0)
> > + return error;
> > #ifdef DIAGNOSTIC
> > if (ump->um_maxsymlinklen <= 0)
> > panic("ufs_whiteout: old format filesystem");
> > #endif
> > @@ -807,8 +868,9 @@ ufs_whiteout(void *v)
> > if (cnp->cn_flags & HASBUF) {
> > PNBUF_PUT(cnp->cn_pnbuf);
> > cnp->cn_flags &= ~HASBUF;
> > }
> > + fstrans_done(dvp->v_mount);
> > return (error);
> > }
>
>
> > +/*
> > + * Return whether or not the node is locked.
> > + */
> > +int
> > +ufs_islocked(void *v)
> > +{
> > + struct vop_islocked_args /* {
> > + struct vnode *a_vp;
> > + } */ *ap = v;
> > + struct vnode *vp = ap->a_vp;
> > +
> > + return (lockstatus(vp->v_vnlock));
> > +}
>
> This routine looks like the genfs routine. Do we really need a separate
> one?
I think it is cleaner to move the complete lock family (lock, unlock, islocked).
> > Index: sys/ufs/ffs/ffs_snapshot.c
> > ===================================================================
> > RCS file: /cvsroot/src/sys/ufs/ffs/ffs_snapshot.c,v
> > retrieving revision 1.30
> > diff -p -u -4 -r1.30 ffs_snapshot.c
> > --- sys/ufs/ffs/ffs_snapshot.c 7 Jun 2006 22:34:19 -0000 1.30
> > +++ sys/ufs/ffs/ffs_snapshot.c 30 Jul 2006 16:54:09 -0000
> > @@ -58,8 +58,9 @@ __KERNEL_RCSID(0, "$NetBSD: ffs_snapshot
> > #include <sys/resource.h>
> > #include <sys/resourcevar.h>
> > #include <sys/vnode.h>
> > #include <sys/kauth.h>
> > +#include <sys/fstrans.h>
> >
> > #include <miscfs/specfs/specdev.h>
> >
> > #include <ufs/ufs/quota.h>
> > @@ -284,9 +285,9 @@ ffs_snapshot(struct mount *mp, struct vn
> > * All allocations are done, so we can now snapshot the system.
> > *
> > * Suspend operation on filesystem.
> > */
> > - if ((error = vfs_write_suspend(vp->v_mount, PUSER|PCATCH, 0)) != 0) {
> > + if ((error = vfs_suspend(vp->v_mount, 0)) != 0) {
> > vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
> > goto out;
> > }
> > vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
> > @@ -378,25 +379,33 @@ loop:
> > VI_UNLOCK(xvp);
> > MNT_ILOCK(mp);
> > continue;
> > }
> > +#ifndef NEWVNGATE
> > if (vn_lock(xvp, LK_EXCLUSIVE | LK_INTERLOCK) != 0) {
> > MNT_ILOCK(mp);
> > goto loop;
> > }
> > +#else /* NEWVNGATE */
> > + VI_UNLOCK(xvp);
> > +#endif /* NEWVNGATE */
>
> VI_UNLOCK()?
ffs_snapshot.c is still near to the FreeBSD one. On top it has:
#define VI_UNLOCK(v) simple_unlock(&(v)->v_interlock)
> > Index: sys/kern/vfs_subr.c
> > ===================================================================
> > RCS file: /cvsroot/src/sys/kern/vfs_subr.c,v
> > retrieving revision 1.267
> > diff -p -u -4 -r1.267 vfs_subr.c
> > --- sys/kern/vfs_subr.c 23 Jun 2006 14:13:02 -0000 1.267
> > +++ sys/kern/vfs_subr.c 30 Jul 2006 16:54:06 -0000
>
> > @@ -1203,9 +1204,16 @@ vget(struct vnode *vp, int flags)
> > }
> > #endif
> > if (flags & LK_TYPE_MASK) {
> > if ((error = vn_lock(vp, flags | LK_INTERLOCK))) {
> > - vrele(vp);
> > + simple_lock(&vp->v_interlock);
> > + if (vp->v_usecount > 1) {
> > + vp->v_usecount--;
> > + simple_unlock(&vp->v_interlock);
> > + } else {
> > + simple_unlock(&vp->v_interlock);
> > + vrele(vp);
> > + }
> > }
> > return (error);
> > }
> > simple_unlock(&vp->v_interlock);
>
> Why doe we need the above?
It cured a deadlock. I'm not sure it is needed anymore. Will test again.
[snip]
> > Index: sys/uvm/uvm_pdaemon.c
> > Index: common/lib/libc/gmon/mcount.c
> > Index: sys/arch/sparc64/dev/psm.c
These diffs are not part of the fs transactions. I forgot to remove them.
> Take care,
>
> Bill
--
Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)