Subject: Re: kern/30831
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Antti Kantee <pooka@cs.hut.fi>
List: netbsd-bugs
Date: 04/03/2007 11:20:02
The following reply was made to PR kern/30831; it has been noted by GNATS.
From: Antti Kantee <pooka@cs.hut.fi>
To: Patrick Welche <prlw1@newn.cam.ac.uk>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/30831
Date: Tue, 3 Apr 2007 14:16:49 +0300
On Tue Apr 03 2007 at 12:00:06 +0100, Patrick Welche wrote:
> On Mon, Apr 02, 2007 at 11:14:42PM +0300, Antti Kantee wrote:
> > Patrick: can you get a ps listing out of the kernel? Anything sleeping
> > with the wait channel smb* (probably smbirq, although I'm not familiar
> > with the smb code)?
>
> Sadly no:
>
> # ps -M netbsd.1.core
> ps: can't read pgrp at 0x0: Undefined error: 0
>
> (and my ps/l didn't work because I didn't sync (reboot 0x104))
Maybe try xps from /sys/gdbscripts/xps instead?
> Nice point: "Other file systems get lucky because they don't sleep in reclaim."
>
> I was playing spot-the-difference with ffs I didn't spot one...
When smbfs does vrele() for the parent directory, it might end up in a
situation where it contacts the server. There is a definate time window
between issuing the request to the server and getting the response back.
During this time the node is in a bad state.
Local file systems don't have this problem because they don't have
network delay. I am not sure if they could have disk delay due to this.
But also, they seem to do the reclaim operation in a slightly different
order and call vrele() earlier.
If the problem is easy to repeat, please try if this patch/hack makes
it go away:
Index: smbfs_vfsops.c
===================================================================
RCS file: /cvsroot/src/sys/fs/smbfs/smbfs_vfsops.c,v
retrieving revision 1.63
diff -u -r1.63 smbfs_vfsops.c
--- smbfs_vfsops.c 12 Mar 2007 18:18:32 -0000 1.63
+++ smbfs_vfsops.c 3 Apr 2007 11:15:32 -0000
@@ -456,7 +456,13 @@
goto loop;
simple_lock(&vp->v_interlock);
nvp = TAILQ_NEXT(vp, v_mntvnodes);
+
np = VTOSMB(vp);
+ if (np == NULL) {
+ simple_unlock(&vp->v_interlock);
+ continue;
+ }
+
if ((vp->v_type == VNON || (np->n_flag & NMODIFIED) == 0) &&
LIST_EMPTY(&vp->v_dirtyblkhd) &&
vp->v_uobj.uo_npages == 0) {
--
Antti Kantee <pooka@iki.fi> Of course he runs NetBSD
http://www.iki.fi/pooka/ http://www.NetBSD.org/
"la qualité la plus indispensable du cuisinier est l'exactitude"