tech-net: Re: problems with nmbcluster (?)

Subject: Re: problems with nmbcluster (?)
To: None <6bone@6bone.informatik.uni-leipzig.de>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: tech-net
Date: 01/07/2007 22:16:59
On Sun, Jan 07, 2007 at 08:28:10PM +0100, 6bone@6bone.informatik.uni-leipzig.de wrote:
> On Sun, 7 Jan 2007, Manuel Bouyer wrote:
> 
> >Date: Sun, 7 Jan 2007 19:09:59 +0100
> >From: Manuel Bouyer <bouyer@antioche.eu.org>
> >To: 6bone@6bone.informatik.uni-leipzig.de
> >Cc: tech-net@NetBSD.org
> >Subject: Re: problems with nmbcluster (?)
> >
> >On Sun, Jan 07, 2007 at 05:44:48PM +0100, 
> >6bone@6bone.informatik.uni-leipzig.de wrote:
> >>hello,
> >>
> >>I have some problems with the network. I have to restart my server
> >>continuously, because after some days the server loses all connection to
> >>the network. You cannot establish any connections or do any pings. You can
> >>only restart the server. After the restart everything works fine for some
> >>days.....
> >>
> >>I have tested some kernels (3.0, 3.1, current....) but always the same
> >>effect occurs. On the server runs no special service. Only apache2 and
> >>postgresql from the pkgsrc. I don't know why the problem only occurs at my
> >>system. It is a dual i386/PIII with enabled IPv6 and an intel nic.
> >>
> >>I cannot give you more special hints. Only one output from 'netstat -mss'
> >>after the connection was lost:
> >>
> >>1441 mbufs in use:
> >>         1150 mbufs allocated to data
> >>         291 mbufs allocated to packet headers
> >>132521 calls to protocol drain routines
> >>
> >>
> >>Can anyone give me a hint for a possible solution or workaround? The
> >>continuous restarts are not longer possible. I have already exchanged the
> >>complete hard- and software.
> >
> >What does 'vmstat -m|grep mclpl' shows ?
> >
> >-- 
> >Manuel Bouyer <bouyer@antioche.eu.org>
> >    NetBSD: 26 ans d'experience feront toujours la difference
> >--
> >
> 
> the uptime at the moment is only 4h - so I can only report the actual 
> output:
> 
> netstat -mss && vmstat -m|grep mclpl
> 
> 1497 mbufs in use:
>         1110 mbufs allocated to data
>         387 mbufs allocated to packet headers
> 34 calls to protocol drain routines
> 
> vmstat: Kmem statistics are not being gathered by the kernel.
> mclpl       2048     1578    0      938   408    74   334   398     4   512 

I suspect your system is running out of mclpl on occasion, and this cause the
network atapter (or the IP stack) to stall. Try bumping nmbclusters.

For example on ftp.fr.netbsd.org I have it set to 8192.

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--