Subject: Re: More data. Re: kernel panic in nfs_reclaim (kern/17107)
To: Artem Belevich <art@riverstonenet.com>
From: Christos Zoulas <christos@zoulas.com>
List: tech-kern
Date: 10/02/2002 15:12:04
On Oct 2, 11:23am, art@riverstonenet.com (Artem Belevich) wrote:
-- Subject: More data. Re: kernel panic in nfs_reclaim (kern/17107)
Ok, we are narrowing this down. Something that unmount the filesystem
does not pay attention to flushing the vnode. Could that be happening
*during* unmount. I.e. we might be calling nfs_reclaim while the nfs_unmount
is in progress? Does the following help?
christos
Index: nfs_node.c
===================================================================
RCS file: /cvsroot/syssrc/sys/nfs/nfs_node.c,v
retrieving revision 1.53
diff -u -u -r1.53 nfs_node.c
--- nfs_node.c 2002/03/16 23:05:25 1.53
+++ nfs_node.c 2002/10/02 19:11:19
@@ -272,7 +272,8 @@
} */ *ap = v;
struct vnode *vp = ap->a_vp;
struct nfsnode *np = VTONFS(vp);
- struct nfsmount *nmp = VFSTONFS(vp->v_mount);
+ extern struct simplelock mntvnode_slock;
+ struct nfsmount *nmp;
if (prtactive && vp->v_usecount != 0)
vprint("nfs_reclaim: pushing active", vp);
@@ -282,9 +283,12 @@
/*
* For nqnfs, take it off the timer queue as required.
*/
+ simple_lock(&mntvnode_slock);
+ nmp = VFSTONFS(vp->v_mount);
if ((nmp->nm_flag & NFSMNT_NQNFS) && np->n_timer.cqe_next != 0) {
CIRCLEQ_REMOVE(&nmp->nm_timerhead, np, n_timer);
}
+ simple_unlock(&mntvnode_slock);
/*
* Free up any directory cookie structures and
christos
| I've got the panic tonight and I still have machine in DDB. I think I
| can keep it this way for couple more hours. So if somebody would like
| to get more info from DDB - I'd be happy to type commands for you.
|
| Here's the stack trace. This time from 1.6 GENERIC_DIAGNOSTIC kernel.
|
| nfs_reclaim(e6200c54,8,0,c02a6953,e47dcc9c) at nfs_reclaim+0x54
| VOP_RECLAIM(e4cd70f4,e3c42740,200000,0) at VOP_RECLAIM+0x2e
| vclean(e4cd70f4,8,e3c42740,c025eb3c) at vclean+0x107
| vgonel(e4cd70f4,e3c42740,0,c026034e) at vgonel+0x46
| getnewvnode(1,c10a4200,c0f7ef00,e6200d4c,0) at getnewvnode+0x210
| ffs_vget(c10a4200,56b198,e6200dd8,e3c42740,e58bfcb4) at ffs_vget+0x4f
| ufs_lookup(e6200e10,30002,e6200e20,c02b14f9,e6200ef8) at ufs_lookup+0x74a
| VOP_LOOKUP(e58bfcb4,e6200f08,e6200f1c,c02aac3a,e58bfcb4) at VOP_LOOKUP+0x35
| lookup(e6200ef8,e758d000,400,e6200f10,e6200f80) at lookup+0x2a4
| namei(e6200ef8,e57fd77c,e6200f1c,2) at namei+0x2f1
| sys_unlink(e3c42740,e6200f80,e6200f78,c0375e0f) at sys_unlink+0x3f
| syscall_plain(1f,1f,1f,1f,0) at syscall_plain+0xa7
|
|
| I've checked the VNODE and v_data and v_mount pointers:
|
| db> show vnode e4cd70f4
| OBJECT 0xe4cd70f4: locked=0, pgops=0xc0663f64, npages=0, refs=0
|
| VNODE flags 100<XLOCK>
| mp 0xc1882200 numoutput 0 size 0xffffffffffffffff
| data 0xe6e3fb98 usecount 0 writecount 0 holdcnt 0 numoutput 0
| type VNON(0) tag VT_NFS(2) id 0xc3c7ed mount 0xc1882200 typedata 0x0
|
| db> show object 0xe4cd70f4
| OBJECT 0xe4cd70f4: locked=0, pgops=0xc0663f64, npages=0, refs=0
|
| db> x 0xc0663f64
| uvm_vnodeops: 0
|
| v->v_data (nfsnode) seems to be OK. At least it points back to vnode
| v->v_data->n_vnode == 0xe4cd70f4
|
| db> x/m 0xe6e3fb98,40
| 0xe6e3fb98: 50dd65c0 00000000 00000000 00000000 P.e.............
| 0xe6e3fba8: 00000000 00000000 552e50c0 ffffffff ........U.P.....
| 0xe6e3fbb8: 08000000 00000000 00000000 00000000 ................
| 0xe6e3fbc8: 00000000 00000000 00000000 00000000 ................
| 0xe6e3fbd8: 00000000 00000000 00000000 00000000 ................
| 0xe6e3fbe8: 00000000 00000000 e0e928c1 00000000 ..........(.....
| 0xe6e3fbf8: 00000000 00000000 00000000 3cfce3e6 ............<...
| 0xe6e3fc08: 804d9be6 f470cde4 00000000 00000000 .M...p..........
| 0xe6e3fc18: 00000000 00000000 00000000 00000000 ................
| 0xe6e3fc28: 00000000 00000000 00000000 00000000 ................
| 0xe6e3fc38: 20000000 346e3700 321d8700 20000000 ...4n7.2... ...
| 0xe6e3fc48: 00376e34 321d8700 3b7c0000 21411a00 .7n42...;|..!A..
| 0xe6e3fc58: 6f4f0400 00000000 00000000 00000000 oO..............
| 0xe6e3fc68: 00000000 00000000 00000000 00000000 ................
| 0xe6e3fc78: 00000000 ffffffff 00000000 00000000 ................
| 0xe6e3fc88: 00000000 00000000 00000000 00000000 ................
|
| Here comes v->v_mount pointer and the data doesn't look good to me.
| Mount point has been freed and had type M_UVMAMAP (0x52==82).
|
| db> x/m 0xc1882200,40
| 0xc1882200: efbeadde 5200adde 00c688c1 efbeadde ....R...........
| 0xc1882210: efbeadde efbeadde efbeadde efbeadde ................
| 0xc1882220: 08000000 09000000 0a000000 0b000000 ................
| 0xc1882230: 0c000000 0d000000 0e000000 0f000000 ................
| 0xc1882240: 10000000 11000000 12000000 13000000 ................
| 0xc1882250: 14000000 15000000 16000000 17000000 ................
| 0xc1882260: 18000000 19000000 1a000000 1b000000 ................
| 0xc1882270: 1c000000 1d000000 1e000000 1f000000 ................
| 0xc1882280: 20000000 21000000 22000000 23000000 ...!..."...#...
| 0xc1882290: 24000000 25000000 26000000 27000000 $...%...&...'...
| 0xc18822a0: 28000000 29000000 2a000000 2b000000 (...)...*...+...
| 0xc18822b0: 2c000000 2d000000 2e000000 2f000000 ,...-......./...
| 0xc18822c0: 30000000 31000000 32000000 33000000 0...1...2...3...
| 0xc18822d0: 34000000 35000000 36000000 37000000 4...5...6...7...
| 0xc18822e0: 38000000 39000000 3a000000 3b000000 8...9...:...;...
| 0xc18822f0: 3c000000 3d000000 3e000000 3f000000 <...=...>...?...
|
| --Artem
|
| On Mon, Sep 30, 2002 at 07:59:10PM -0400, Christos Zoulas <christos@zoulas.com> wrote:
| > On Sep 30, 3:55pm, art@riverstonenet.com (Artem Belevich) wrote:
| > -- Subject: Re: kernel panic in nfs_reclaim (kern/17107)
| >
| > Is the rest of the vnode valid?
| >
| > christos
| >
| > | This was the first thing I tried. The kernel survived for a bit longer
| > | - something like 3-4 days instead of usual nightly panic attack, but
| > | finally it crashed in the same place with nmp=0xc. This suggests
| > | that vnode's vp->v_mount has already been reused for something else.
| > |
| > | This carsh confuses me a little - if filesystem is unmounted,
| > | shouldn't all vnodes associated with it be gone? If so, then how comes
| > | this particular rogue vnode was still around?
| > |
| > | --Artem
| > |
| > |
| > -- End of excerpt from Artem Belevich
| >
| >
-- End of excerpt from Artem Belevich