Subject: Re: help me analyze my servers failure
To: port-mac68k <port-mac68k@netbsd.org>
From: None <josh@ssimr.com>
List: port-mac68k
Date: 04/13/2001 21:31:58
I'm trying to follow through to analyze what made my server fail. It's
been up for about a week now. I keep a top window running on a telnet
window from another machine on the network as well as e-mailing the
output of top and swapctl to a few addresses.
As far as I can tell
1. My machine uses between 40 and 46 of the 68M available.
2. Never touches swap
3. runs about 25 to 30 processes
and is stable as a rock while I watch it.
I'm starting to wonder if there weren't some external factors.
On Mon, Apr 09, 2001 at 07:39:35PM -0500, Bob Nestor wrote:
> josh@ssimr.com
>
> >On Mon, Apr 09, 2001 at 04:42:09PM -0700, Cameron Kaiser wrote:
> >> > What happens is I can't get in, nor can any clients connect to access
> >> > the services. It is still up on my network (judging from port-scans
> >> > done with agnet tools from another Mac on my network). I have been
> >> > running it headless, but when I stick the monitor back on - even
> >> > though I can still get a display - I can't get a keyboard response. I
> >> > find out either when I try to telnet into the server, or someone
> >> > trying to check their mail in the house or send mail finds they can't.
> >>
> >> There's not enough information here.
> >
> >That is correct.
> >
> >Bob Nestor suggested Im running out of swap space. I mention this here
> >because I have the same problem testing that as I will evertyhing
> >else. Which is everything is very clean right now. If Bob is right,
> >then I have an application slowly leaking memory. I'm running bind,
> >sendmail, gnu-pop3 daemon from compiled source. The apache daemon is
> >from a package at install time and the telnet daemon was done with the
> >install.
>
> When I had the problem I discovered that things like telnet didn't work
> because the daemons had been killed as part of the effort by the kernel
> to recover from lack of SWAP space. Basically when the system runs out
> of SWAP space it starts shedding processes in an attempt to free up SWAP.
> Unfortunately it seems to get the low numbered processes first, like the
> init process "1".
>
> I found the real culprit by leaving an open session running on the
> console that would hopefully survive the process killer when the system
> locked up. You might try this and/or leaving top run on the console to
> see if you can determine the real state of the system when it locks up.
>
> -bob
--
Josh Kuperman
josh@ssimr.com
http://www.ssimr.com