current-users: Re: RAIDFrame and RAID-5

Subject: Re: RAIDFrame and RAID-5
To: NetBSD-current Discussion List <current-users@NetBSD.org>
From: Frederick Bruckman <fredb@immanent.net>
List: current-users
Date: 10/28/2003 20:24:19
On Tue, 28 Oct 2003, Greg A. Woods wrote:

> [ On Monday, October 27, 2003 at 15:31:21 (-0600), Frederick Bruckman wrote: ]
> > Subject: Re: RAIDFrame and RAID-5
> >
> > That should give the maximum of 64MB of kvm pages (16K pages of 4K
> > each), which sounds like plenty, but I suppose it could have become
> > too fragmented.
>
> It seems I have:
>
> 	vm.nkmempages = 20455
> 	vm.anonmin = 10
> 	vm.execmin = 5
> 	vm.filemin = 10
> 	vm.maxslp = 20
> 	vm.uspace = 8192
> 	vm.anonmax = 80
> 	vm.execmax = 30
> 	vm.filemax = 50
>
> Isn't that almost 80MB, i.e. bigger than the "maximum"?

Yah, 80MB seems like plenty, but then your usage pattern called for
some really big allocations in the RAID and anon rows.

Those "min" and "max" values are percentages for managing use of the
unified buffer cache, not the kernel memory allocator.

> (I don't have any KMEMPAGES* options in my kernel)
>
> > Hey, you're not swapping to RAID-5 are you? That
> > configuration is known to cause problems.
>
> No, but I am swapping to the RAID-1 (i.e. raid2b in my config):
>
> 	$ /sbin/swapctl -l
> 	Device      512-blocks     Used    Avail Capacity  Priority
> 	/dev/raid2b    2097152   530648  1566504    25%    0
>
> (sorry I should have included that info before!)

So you are doing quite a bit of swapping. Does "top" sorted by "size"
show the same applications getting paged out and paged back in
constantly? I have seen a machine with only 16mb take 4 hours to boot,
resolved by raising vm.execmin to 10. You can play with those numbers
via "sysctl" while running. That would not make you run out of mbufs,
though.

> > Memory resource pool statistics
> > Name        Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
> > mbpl         256    19224 10295   19087   203   186    17    44     1   inf    0
> >                           ^^^^^
> >                            |||
> >
> > Greg ran out of MBUFS, all right.
>
> Hmm.... interesting.  I somehow missed seeing that entry in the fail column.
>
> Given that the fail count has not changed since (though I've not really
> stressed the system much since either), I would indeed guess that those
> failures coincided with the freeze.
>
> Name        Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
> mbpl         256    35588 10295   35400   219   198    21    44     1   inf    3
>
> > What I would try, is to increase NKMEMPAGES until the problem isn't
> > reproducable anymore.
>
> I will try to do so, though I'm losing access to the second RAID-5
> (/save) very soon -- the disks must be moved back to the system they
> belong with to be put into production for a mail spool.
>
>
> BTW, what do we need to do to be able to report mbuf peak and max use as
> FreeBSD does?
>
> 	$ netstat -m
> 	387/1056/26624 mbufs in use (current/peak/max):
> 	        386 mbufs allocated to data
> 	        1 mbufs allocated to packet headers
> 	384/502/6656 mbuf clusters in use (current/peak/max)
> 	1268 Kbytes allocated to network (6% of mb_map in use)
> 	0 requests for memory denied
> 	0 requests for memory delayed
> 	0 calls to protocol drain routines

"options MBUFTRACE". See options(4).


Frederick