Subject: race conditions in dounmount...
To: None <tech-kern@NetBSD.ORG>
From: Bill Sommerfeld <sommerfeld@orchard.medford.ma.us>
List: tech-kern
Date: 06/18/1996 14:32:58
-----BEGIN PGP SIGNED MESSAGE-----
I observed some nasty races between unmount(2) and vfs_unmountall()
(which called at shutdown time) on a filesystem where unmount() was
very slow (a modified version of AFS).
If an unmount was in progress when vfs_unmountall() was entered, the
following race occurred (more or less):
A0) unmount called.
A1) MP marked "busy" with vfs_busy()
B0) reboot called.
B1) vfs_unmountall() calls dounmount on same FS; winds up blocked in
vfs_busy() waiting for the MP point to become unbusy.
A2) unmount completes; MP marked unbusy with vfs_unbusy(); B marked runnable
A3) file system freed by free().
A4) unmount returns; process exits.
B2) vfs_unbusy wakes up, dounmount continues operating on freed MP.
B3) "shit happens":
In particular, we crash dereferencing NULL attempting to fetch
mp->mnt_op->vfs_sync.
[Because this kernel is compiled with DIAGNOSTIC, and because of
Murphy's Law,
offsetof(struct mount, mnt_op) == offsetof(struct freelist, next);
The "freelist.next" pointer gets set to NULL by free(), and then
VFS_SYNC() crashes attempting to dereference a NULL mnt_op vector.]
- ---
I also observed the following bit of well-meaning, but useless code:
After blocking until the mount point becomes unbusy, vfs_busy contains
code to fail and return 1 if MNT_UNMOUNT is set in the MP. However,
dounmount's code is roughly:
if (vfs_busy(mp))
return EBUSY;
mp->mnt_flag |= MNT_UNMOUNT;
... call underlying VFS to do unmount ...
mp->mnt_flag &= ~MNT_UNMOUNT;
vfs_unbusy(mp);
so vfs_busy should never see MNT_UNMOUNT set, and thus it always
returns zero.
- ----
In short, it looks like dounmount is one big race condition; the
attempts to avoid it in vfs_busy/vfs_unbusy fail miserably because the
mp gets freed while other processes are still blocked in vfs_busy()...
Suggested *rough* fix [there are probably some bugs here, ...]:
0) don't clear MNT_UNMOUNT; leave it set until the mp is freed..
1) add a mp->mnt_waiters field
2) change vfs_busy to be roughly:
mp->mnt_waiters++;
while (mp->mnt_flag & <lock bit>) {
mp->mnt_flag |= <want bit>;
tsleep(&mp->mnt_flag, ...)
}
mp->mnt_waiters--;
if (mp->mnt_flag & MNT_UNMOUNT) {
wakeup(&mp->mnt_waiters, ...)
return 1;
}
add add a vfs_unbusy_final() called by dounmount, which is roughly:
vfs_unbusy(mp);
while (mp->mnt_waiters > 0) {
tsleep(&mp->mnt_waiters, ...);
}
Two questions for those of you with more experience in this chunk of code:
1) is my interpretation of this correct (if so, I'll send-pr this..)
2) is my idea for how to fix this correct, or am I headed in the
wrong direction?
- Bill
-----BEGIN PGP SIGNATURE-----
Version: 2.6.1
iQCVAwUBMca+F7T+rHlVUGpxAQEAWAP/RTyhaSpTnPGazoBu2GrCxKL1zI2Wz7CL
5gT06dIdbJSKUmCTAyvThHiGCsSW0sht0uU1cN8O0mPk+zyB7GbcdcvWrk8OwJ6a
TFn8h0/vu4EvY/MB6xuUPjm2hMiy2ko9k57rEnW/xqCw6uCOvw2RspyAS2T+avE5
FS0Pqj9hQf0=
=a09w
-----END PGP SIGNATURE-----