Subject: kern/33076: reproducable pool free list corruption
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Martin Husemann <martin@aprisoft.de>
List: netbsd-bugs
Date: 03/14/2006 08:45:01
>Number: 33076
>Category: kern
>Synopsis: reproducable pool free list corruption
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Mar 14 08:45:01 +0000 2006
>Originator: Martin Husemann
>Release: NetBSD 3.99.16
>Organization:
>Environment:
System: NetBSD martins.aprisoft.de 3.99.16 NetBSD 3.99.16 (MARTINS) #0: Mon Mar 13 09:38:18 CET 2006 martin@martins.aprisoft.de:/usr/src/sys/arch/amd64/compile/MARTINS amd64
Architecture: x86_64
Machine: amd64
>Description:
On this machine, I can reliably panic the kernel:
pool_get(mbpl): free list modified: magic=ffffffff; page 0xffff800016ce9000;
item addr 0xffff800016ce9e00
(backtrace varies, as expected by this kind of corruption)
>How-To-Repeat:
I run two parallel cvs checkouts (one for xsrc, one for src) in /tmp.
That does it on this machine.
Previously I suspected random corruption since the problem seems not to happen
when I limit the usable RAM to 2GB, but after various other fixes this exact
panic (well, with varying addresses, of course) is the only kernel panic
happening.
So now I suspect some pretty volatile race condition instead.
>Fix:
Hints on how to panic closer to the culprit would be welcome ;-)