Subject: NFS problems
To: None <tech-kern@netbsd.org>
From: None <rick@snowhite.cis.uoguelph.ca>
List: tech-kern
Date: 07/23/2002 11:53:15
Well, I'm not directly involved in the NetBSD code, but here are a few
random comments that might be useful:
- If you have too large a blocksize when using NFS over UDP, you'll
see "IP Fragments dropped due to timeout" when you do "netstat -s".
(These indicate that fragments of the large UPD datagram aren't
making it through the network interconnect and cause serious
performance degredation. The "fix" is to either reduce the read/write
data size or switch to TCP. See "man mount_nfs" to find out how to
do either of these.)
- For NFS Version 2, the spec. (RFC 1094) stipulated a maximum of 8,192 bytes,
so using a larger blocksize for V2 violates the spec. (I'm sure
implementations do it and I'm sure some work, but if you want to be
technically correct, you should only allow block sizes > 8192 for V3.)
- The NFS code is much more sensitive to buffer cache race conditions
than local file systems. Among other reasons is the fact that local
file systems almost never take several seconds to do an I/O operation.
(Mounting a really slow NFS server, like one with debugging printfs
turned on is the best way to "find" these. Been there, have the T-shirt.
Now, once you "find" them, fixing them can be great fun.) In the past,
when I say mysterious intermittent hangs, it usually turned out to be
buffer cache bugs. "ps axl" should give you a hint, based on what the
processes are waiting on.
Good luck with it. Finding these can be lots of fun, rick
ps: This is why I choose to do my development on a really old version of
BSD, since I'd rather avoid buffer cache surprises when trying to get
other things working:-)