On Fri, 23 Jan 2009, Greg A. Woods; Planix, Inc. wrote:
On 23-Jan-2009, at 9:00 AM, Johnny Billquist wrote:
Compared to the first hang, this is in a way inverted. First hang it
didn't respond to pings, but I could break into ddb. This time it
responds to pings, but I can't break into ddb.
Oddly this sounds very much like the kinds of bizarre behaviour I
would see on my i386 server when it was running out of kernel memory.
Kernel memory is very restricted on i386, even if the machine has 4GB
or more RAM (which is why I tried to use Alphas for all my servers
which need 4GB or more RAM).
My solution on i386 was to restrict BUFCACHE to 10% which, IIUC, keeps
the number of buffer headers low enough that they don't squeeze the
kernel for too much memory.
You might also want to reduce kern.maxvnodes and/or any other tuning
variable that might affect performance but which won't cause actual
operations problems (eg. don't reduce NMBCLUSTERS if the machine
handles lots of network traffic as not having enough memory dedicated
to networking, especially with NFS, will also cause actual failures,
not just degraded performance).
I wonder if there is some issue where some resource is not being
freed fast - on faster platforms its much harder to trigger the
issue, but on the vax its much easier...