Subject: Re: Proposal: File system suspension - prerequisite for snapshots
To: None <tech-kern@netbsd.org>
From: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
List: tech-kern
Date: 08/13/2003 23:30:19
On Wed, Aug 13, 2003 at 12:02:01PM -0700, Bill Studenmund wrote:
> On Tue, 12 Aug 2003, Juergen Hannken-Illjes wrote:
>
> > I propose the support for file system suspension from FreeBSD.
> >
> > The (quite simple) API:
> >
> > int
> > vfs_write_suspend(struct mount *mp)
> >
> > Request a mounted file system to suspend write operations
> > and leave it in a clean on-disk state. All operations are
> > complete on exit.
> >
> > void
> > vfs_write_resume(struct mount *mp)
> >
> > Request a suspended file system to resume write operations.
> >
> > This is a needed prerequisite for file system snapshots. It may also
> > help in system suspension. File system snapshots would give us at least
> > safe dumps from running systems and background fsck (with softdep enabled
> > file systems).
> >
> > The implementation would gate most file system syscalls like this:
> >
> > if ((error = vn_start_write(vp, &mp, V_WAIT | PCATCH)) != 0)
> > return (error);
> > do_the_write_operation
> > vn_finished_write(mp);
> >
> > or
> >
> > restart:
> > prepare_a_write_operation
> > if (vn_start_write(nd.ni_dvp, &mp, V_NOWAIT) != 0) {
> > abort_current_preparation
> > if ((error = vn_start_write(NULL, &mp, V_XSLEEP | PCATCH)) != 0)
> > return (error);
> > goto restart;
> > }
> > do_the_write_operation
> > vn_finished_write(mp);
> >
> > Doing it this way guarantees that no operation sleeps with locked vnodes.
>
> Note: don't you end up calling vn_start_write() _after_ you've been told
> it's ok to start? :-) If you _want_ that to be part of the interface, we
> need to document it. For one, it prevents a simple reference counting
> mechanism to determine if writes are in progress.
vn_start_write(..., V_NOWAIT) will return zero on success. If the result
is non-zero it means it's NOT ok to start. So we have to abort and wait
until it is ok (the second vn_start_write(..., V_XSLEEP | PCATCH)) and
restart the syscall.
> > It is not possible to put this gating into the VFS_ calls as they are often
> > called with locked vnodes and the suspend request may deadlock.
> > For the same reason this gating cannot reside below the VFS_ level.
>
> I think I don't like it, but I believe you're right that there are issues
> with doing it at the VOP level with the locked vnodes. i.e. this way may
> well be the best in the long run, even if I don't like it. :-)
>
> Note: you _can_ do it at the VOP_level, it would just mean having a
> routine or routines that would unlock the node, sleep, then re-lock and
> move on. But it's probably cleaner to do as you suggest and just make sure
> it's ok to do the write before starting it.
>
> Take care,
>
> Bill
--
Juergen Hannken-Illjes - hannken@eis.cs.tu-bs.de - TU Braunschweig (Germany)