On Fri, 23 Jan 2009, Greg A. Woods; Planix, Inc. wrote:
On 23-Jan-2009, at 9:00 AM, Johnny Billquist wrote:
Compared to the first hang, this is in a way inverted. First hang it
didn't respond to pings, but I could break into ddb. This time it
responds to pings, but I can't break into ddb.
Oddly this sounds very much like the kinds of bizarre behaviour I would
see on my i386 server when it was running out of kernel memory. Kernel
memory is very restricted on i386, even if the machine has 4GB or more RAM
(which is why I tried to use Alphas for all my servers which need 4GB or
more RAM).
My solution on i386 was to restrict BUFCACHE to 10% which, IIUC, keeps the
number of buffer headers low enough that they don't squeeze the kernel for
too much memory.
You might also want to reduce kern.maxvnodes and/or any other tuning
variable that might affect performance but which won't cause actual
operations problems (eg. don't reduce NMBCLUSTERS if the machine handles
lots of network traffic as not having enough memory dedicated to
networking, especially with NFS, will also cause actual failures, not just
degraded performance).
I wonder if there is some issue where some resource is not being
freed fast - on faster platforms its much harder to trigger the
issue, but on the vax its much easier...