tech-kern: NFS problems

Subject: NFS problems
To: None <tech-kern@netbsd.org>
From: None <rick@snowhite.cis.uoguelph.ca>
List: tech-kern
Date: 07/23/2002 11:53:15
Well, I'm not directly involved in the NetBSD code, but here are a few
random comments that might be useful:

- If you have too large a blocksize when using NFS over UDP, you'll
  see "IP Fragments dropped due to timeout" when you do "netstat -s".
  (These indicate that fragments of the large UPD datagram aren't
   making it through the network interconnect and cause serious
   performance degredation. The "fix" is to either reduce the read/write
   data size or switch to TCP. See "man mount_nfs" to find out how to
   do either of these.)

- For NFS Version 2, the spec. (RFC 1094) stipulated a maximum of 8,192 bytes,
  so using a larger blocksize for V2 violates the spec. (I'm sure
  implementations do it and I'm sure some work, but if you want to be
  technically correct, you should only allow block sizes > 8192 for V3.)

- The NFS code is much more sensitive to buffer cache race conditions
  than local file systems. Among other reasons is the fact that local
  file systems almost never take several seconds to do an I/O operation.
  (Mounting a really slow NFS server, like one with debugging printfs
   turned on is the best way to "find" these. Been there, have the T-shirt.
   Now, once you "find" them, fixing them can be great fun.) In the past,
  when I say mysterious intermittent hangs, it usually turned out to be
  buffer cache bugs. "ps axl" should give you a hint, based on what the
  processes are waiting on.

Good luck with it. Finding these can be lots of fun, rick
ps: This is why I choose to do my development on a really old version of
    BSD, since I'd rather avoid buffer cache surprises when trying to get
    other things working:-)