Subject: NFS hangs on 2.0 client
To: None <tech-net@netbsd.org>
From: Jeff Rizzo <riz@tastylime.net>
List: tech-net
Date: 01/07/2005 09:45:28
I had a brief discussion about this on netbsd-help, but the problem just
happened again, so I thought I'd solicit wider opinions.
I have an NFS client and NFS server both running NetBSD/i386 2.0, with
MP kernels. The network interfaces are both fxp (Intel 82559), and they
are connected by a LAN switch which appears to be operating normally;
non-NFS traffic appears to go correctly between the hosts, and a second
NetBSD/i386 box (running 2.0_BETA) is accessing (albeit read-only) a
share from the server without problems. The mounts are all UDP. (The
hanging mounts are rw, and have the "soft" and "intr" flags
The symptoms are this: upon a fresh boot, the client can access NFS
shares on the server just fine; I'm doing pkgsrc bulk builds on the
client, and storing the built packages on one of the NFS volumes. After
some period of time (this time it was ~36 hours), all NFS accesses from
this client hang. From what I can tell using tcpdump, an 'ls' on the
nfs share generates NO traffic between client and server. Following
suggestions from the mount_nfs man page, I looked at the output of
'netstat -s' for info on fragments and UDP.
On the client, I see:
1598454 fragments received
1 fragment dropped (dup or out of space)
0 fragments dropped (out of ipqent)
0 malformed fragments dropped
44 fragments dropped after timeout
<snip>
udp:
6704322 datagrams received
0 with incomplete header
0 with bad data length field
2 with bad checksum
76 dropped due to no socket
0 broadcast/multicast datagrams dropped due to no socket
26 dropped due to full socket buffers
6704218 delivered
6700262 PCB hash misses
6842098 datagrams output
On the server, I see:
6085968 fragments received
1 fragment dropped (dup or out of space)
0 fragments dropped (out of ipqent)
0 malformed fragments dropped
277 fragments dropped after timeout
<snip>
udp:
8236329 datagrams received
0 with incomplete header
0 with bad data length field
0 with bad checksum
862 dropped due to no socket
0 broadcast/multicast datagrams dropped due to no socket
0 dropped due to full socket buffers
8235467 delivered
8025761 PCB hash misses
8247713 datagrams output
The fragments dropped after timeout does not appear to be incrementing,
and also does not seem overly large to me, given the total number of
fragments. 'nfsstat -w 1' on the client shows _no_ activity.
A reboot of the client clears this up, so I'm going to leave the system
in this state for a little while, in case anyone has suggestions for
what I might check. Does anyone have any thoughts?
Thanks,
+j