Subject: Re: memory stats on softupdates
To: None <tech-kern@netbsd.org>
From: George Georgalis <george@galis.org>
List: tech-kern
Date: 10/02/2007 18:48:55
On Fri, Sep 14, 2007 at 05:16:56PM +0100, Andrew Doran wrote:
>On Fri, Sep 14, 2007 at 11:51:00AM -0400, George Georgalis wrote:
>
>> I've been working on reproducing an archive (with public data)
>> that causes a kernel panic on netbsd-3 and RC1. It was suggested
>> I use sysctl to get stats on how much memory softupdates is using,
>> but that seems available in FreeBSD only.
>>
>> Is there a way I can get memory stats for soft updates in a netbsd
>> generic? It would also be useful to know how many pipe resources
>> are being used.
>>
>> Maybe there is some other resources I could look at too? The
>> crash happens when extracting a 21Gb tar.bz2 archive with lots of
>> hardlinks and data with 10:1 compression ratio.
>
>Yes, have a look at the output of vmstat -m. The softdep pools are as below
>Obviously this system is not running softdep, so there are no numbers. You
>probably also want to track the number of buffers in use. Look at the buf*
>pools and/or use 'systat bufcache'.
>
>The maximum number of softdep operations is bounded by max_softdeps, by
>default it's set to:
>
> max_softdeps = desiredvnodes * 4;
>
>Andrew
>
>Memory resource pool statistics
>Name Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
>sdpcpool 124 0 0 0 0 0 0 0 0 inf 0
>pagedeppl 68 0 0 0 0 0 0 0 0 inf 0
>inodedeppl 88 0 0 0 0 0 0 0 0 inf 0
>newblkpl 36 0 0 0 0 0 0 0 0 inf 0
>bmsafemappl 36 0 0 0 0 0 0 0 0 inf 0
>allocdirectpl 80 0 0 0 0 0 0 0 0 inf 0
>indirdeppl 32 0 0 0 0 0 0 0 0 inf 0
>allocindirpl 64 0 0 0 0 0 0 0 0 inf 0
>freefragpl 40 0 0 0 0 0 0 0 0 inf 0
>freeblkspl 172 0 0 0 0 0 0 0 0 inf 0
>freefilepl 36 0 0 0 0 0 0 0 0 inf 0
>diraddpl 36 0 0 0 0 0 0 0 0 inf 0
>mkdirpl 32 0 0 0 0 0 0 0 0 inf 0
>dirrempl 36 0 0 0 0 0 0 0 0 inf 0
>newdirblkpl 20 0 0 0 0 0 0 0 0 inf 0
>
Thanks. These must be the items related specifically to soft updates?
Trying to narrow my problem I've applied the note in
ftp://ftp.netbsd.org/pub/NetBSD-daily/netbsd-3/200709300000Z/LAST_MINUTE
regarding setting vm.bufmem_hiwater and vm.bufmem_lowater, my
hiwater was negative 1.7 GB (-1718112256) in a 16Gb quad core
(2x2cpu) opteron. The problem persisted with reasonable values set
at boot. Next step was to reduce to 2GB RAM, remove FC altogether,
and stress test sata (w/o softupdates) --- no problem. I'm now
stressing over LSI FC with 2Gb RAM. If that passes, I'll try again
with soft updates. If that works I think I've identified the cause
as having over 2GB of RAM on this host. (it failed with 4GB too)
As I began this run, I got "deadbeef" to stderr with the vmstat -H
command... it has doesn't do it now, but how significant is a
corrupted hash chain?
All testing of late has been on pre RC2, netbsd-4 kernel and
userland.
// George
+ vmstat -efH
vmstat: kptr deadbeefdeadbf67: hash chain corrupted: kvm_read: Bad address
5609 forks total
144 forks blocked parent
165 forks shared address space with parent
event total rate type
uvmmap ubackmerge 429657 77 misc
uvmmap uforwmerge 2 0 misc
uvmmap unomerge 355362 64 misc
uvmmap kbackmerge 15150 2 misc
uvmmap kforwmerge 458 0 misc
uvmmap kbimerge 10521 1 misc
uvmmap knomerge 3607786 650 misc
uvmmap map_call 4418936 797 misc
uvmmap mlk_call 25763213 4647 misc
uvmmap mlk_hint 21661633 3907 misc
uvmmap uke_alloc 30382 5 misc
uvmmap uke_free 28126 5 misc
uvmmap ukh_alloc 147 0 misc
uvmmap ukh_free 45 0 misc
pdpolicy reactexec 49096 8 misc
pdpolicy reactanon 90103 16 misc
vmcmd calls 98091 17 misc
vmcmd extends 5006 0 misc
vmcmd kills 10374 1 misc
timecounter binuptime 114252449 20612 misc
timecounter bintime 114252521 20612 misc
timecounter nanotime 85544836 15432 misc
timecounter microtime 28709081 5179 misc
timecounter getnanouptime 10197 1 misc
timecounter getmicrouptime 4226616 762 misc
timecounter getmicrotime 37979641 6851 misc
timecounter setclock 2 0 misc
bus_dma nbouncebufs 1 0 misc
bus_dma loads 2466191 444 misc
cpu0 softclock 552687 99 intr
cpu0 softnet 3779 0 intr
cpu0 softserial 1 0 intr
cpu0 timer 554867 100 intr
cpu0 FPU flush IPI 2 0 intr
cpu0 FPU synch IPI 531 0 intr
cpu0 TLB shootdown IPI 61738730 11138 intr
cpu1 timer 552869 99 intr
cpu1 FPU flush IPI 6 0 intr
cpu1 FPU synch IPI 549 0 intr
cpu1 TLB shootdown IPI 107342194 19365 intr
cpu2 timer 553216 99 intr
cpu2 FPU flush IPI 1 0 intr
cpu2 FPU synch IPI 654 0 intr
cpu2 TLB shootdown IPI 108080308 19498 intr
cpu3 timer 552016 99 intr
cpu3 FPU flush IPI 1 0 intr
cpu3 FPU synch IPI 638 0 intr
cpu3 TLB shootdown IPI 105741881 19076 intr
ioapic0 pin 21 4596 0 intr
ioapic0 pin 14 6 0 intr
ioapic0 pin 22 155263 28 intr
ioapic0 pin 23 34 0 intr
ioapic0 pin 16 1955320 352 intr
ioapic0 pin 17 254 0 intr
ioapic0 pin 3 1 0 intr
total used util num average maximum
hash table buckets buckets % items chain chain
--
George Georgalis, information system scientist <IXOYE><