Subject: kern/1429: mfs can hang at shutdown (+ fix)
To: None <gnats-bugs@gnats.netbsd.org>
From: John Kohl <jtk@kolvir.arlington.ma.us>
List: netbsd-bugs
Date: 08/31/1995 21:27:46
>Number: 1429
>Category: kern
>Synopsis: mfs can hang at shutdown (+ fix)
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people (Kernel Bug People)
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Aug 31 21:50:02 1995
>Last-Modified:
>Originator: John Kohl
>Organization:
NetBSD Kernel Hackers `R` Us
>Release: NetBSD-current as of 31 August 1995
>Environment:
System: NetBSD kolvir 1.0A NetBSD 1.0A (KOLVIR) #624: Wed Aug 9 07:38:58 EDT 1995 jtk@pattern:/u1/NetBSD-current/src/sys/arch/i386/compile/KOLVIR i386
>Description:
When shutting down, it's possible for an MFS file system to hang in
unmount. The problem is that the MFS process is awoken to process file
I/O and then later delivered a signal before it gets onto the
processor. tsleep() then returns indicating the signal delivery, and
MFS forgets to check its I/O queue. It then tries to unmount itself,
which hangs down in spec_fsync() because it's waiting for I/O to
complete on its "device" vnode. The stack trace at this point looks
like:
db> tr *0xf9dbd03c
bpendsleep(f877bfb4,0,f9dbed9c,f8137006,f877bfb4) at bpendsleep
bpendsleep(f877bfb4,11) at bpendsleep
_spec_fsync(f9dbedb8,f877d200,0,f877b700,f877d000) at _spec_fsync+0xa6
_ffs_sync(f877d200,1,f873d480,f8782b00,f877d200) at _ffs_sync+0x142
_dounmount(f877d200,0,f8782b00) at _dounmount+0x64
_mfs_start(f877d200,0,f8782b00,f877d200,f877d21c) at _mfs_start+0x91
_mount(f8782b00,f9dbef84,f9dbef7c,0,1da94) at _mount+0x476
_syscall() at _syscall+0x239
--- syscall (number 21) ---
0x6e57:
and its device vnode is:
db> call vprint(0,0xf877bf80)
type VBLK, usecount 5, writecount 0, refcount 4, flags (VBWAIT)
tag VT_MFS, pid 20, base 200704, size 5242880
If I examine the vnode's data field as an mfsnode, it has entries on its
I/O queue:
db> x/x 0xf8783b80
0xf8783b80: f877bf80
db>
0xf8783b84: 31000
db>
0xf8783b88: 500000
db>
0xf8783b8c: 14
db>
0xf8783b90: f8d405cc << this guy is the mfs_buflist member!
db>
0xf8783b94: deadbeef
db>
0xf8783b98: deadbeef
db>
0xf8783b9c: deadbeef
db>
>How-To-Repeat:
Run lots of stuff, using MFS for /tmp. Shutdown with "shutdown -r
now". Get unlucky in the order of process activation and signal
delivery, and your shutdown hangs.
>Fix:
scan the I/O queue before trying unmount. I had a debug printf in there
once that actually spit out some I/O messages right before one unmount
attempt, so I'm fairly certain this fix works.
*** mfs_vfsops.c 1995/09/01 00:42:44 1.1
--- mfs_vfsops.c 1995/09/01 01:17:38
***************
*** 272,291 ****
base = mfsp->mfs_baseoff;
while (mfsp->mfs_buflist != (struct buf *)(-1)) {
! while (bp = mfsp->mfs_buflist) {
! mfsp->mfs_buflist = bp->b_actf;
! mfs_doio(bp, base);
! wakeup((caddr_t)bp);
}
/*
* If a non-ignored signal is received, try to unmount.
* If that fails, clear the signal (it has been "processed"),
* otherwise we will loop here, as tsleep will always return
* EINTR/ERESTART.
*/
! if (error = tsleep((caddr_t)vp, mfs_pri, "mfsidl", 0))
if (dounmount(mp, 0, p) != 0)
CLRSIG(p, CURSIG(p));
}
return (error);
}
--- 272,295 ----
base = mfsp->mfs_baseoff;
while (mfsp->mfs_buflist != (struct buf *)(-1)) {
! #define DOIO() \
! while (bp = mfsp->mfs_buflist) { \
! mfsp->mfs_buflist = bp->b_actf; \
! mfs_doio(bp, base); \
! wakeup((caddr_t)bp); \
}
+ DOIO();
/*
* If a non-ignored signal is received, try to unmount.
* If that fails, clear the signal (it has been "processed"),
* otherwise we will loop here, as tsleep will always return
* EINTR/ERESTART.
*/
! if (error = tsleep((caddr_t)vp, mfs_pri, "mfsidl", 0)) {
! DOIO();
if (dounmount(mp, 0, p) != 0)
CLRSIG(p, CURSIG(p));
+ }
}
return (error);
}
>Audit-Trail:
>Unformatted: