Subject: netbsd-4/sparc MP kernel panic
To: None <tech-kern@netbsd.org>
From: John D.Baker <jdbaker@mylinuxisp.com>
List: tech-kern
Date: 07/01/2007 23:08:10
I originally posted the following on port-sparc@. Now that I have
output
from a LOCKDEBUG kernel, it was suggested that I post it here. First,
my original message:
--------------------
While rebuilding the system, the console of my dual hypersparc-150
SS20 reported the following:
[...]
Jun 24 11:38:08 ss20a /netbsd: nfs server halloran:/r0/d2/NetBSD: is
alive again
Jun 24 15:38:40 ss20a /netbsd: nfs server hxcall(cpu1,0xf00087e4):
couldn't ping cpus:panic: cpu0cpu0: stuck on lock@f0353344
syncing disks... alloran:/r0/d2/NetBSD: not responding
The machine is running 4.0BETA_2 from around 27 March 2007 with an MP
kernel
customized from GENERIC/GENERIC.MP, built with -mcpu=hypersparc. System
sources are on the file server "halloran", everything else goes to local
disk on the machine "ss20a" itself.
It was doing the following at the time (from frozen SSH session):
[...]
# install /d1/nbsd/DEST/sparc/bin/cp
STRIP=/d1/nbsd/tools/sparc/bin/sparc--netbsdelf-strip
/d1/nbsd/tools/sparc/bin/nbinstall -U -M /d1/nbsd/DEST/sparc/METALOG -D
/d1/nbsd/DEST/sparc -h sha1 -N /amd/halloran/r0/d2/NetBSD/src/etc -c
-r -o root -g wheel -m 555 cp /d1/nbsd/DEST/sparc/bin/cp
--- install-games ---
--- install-backgammon ---
--- install-usr.sbin ---
--- /d1/nbsd/DEST/sparc/usr/share/man/cat8/accton.0 ---
nfs server halloran:/r0/d2/NetBSD: not responding
--------------------
And the message I posted today with LOCKDEBUG output:
---------------------
This time around, I built and installed kernels build from the latest
netbsd-4 sources (updated late 28 June). During the subsequent build
of the userland, I got the same panic again.
When I'd recovered from that, I built and installed kernels with
"options LOCKDEBUG". During the restarts of the userland build, I got
the panic twice more, but with more information.
The output is below. The first line is from the non-LOCKDEBUG kernel,
the
subsequent two groups are from the LOCKDEBUG kernel. The files
referenced
live on my file server, via NFS.
[...]
xcall(cpu1,0xf00087e4): couldn't ping cpus:panic: cpu0cpu0: stuck on
lock@f0317274
[...]
xcall(cpu1,0xf00087e4): couldn't ping cpus:panic: cpu0cpu0: stuck on
lock@f0329604
syncing disks...
simple_lock: locking against myself
lock: 0xf0326d24, currently at:
/amd/halloran/r0/d2/NetBSD/src/sys/kern/kern_synch.c:1237
on CPU 0
last locked: /amd/halloran/r0/d2/NetBSD/src/sys/kern/sys_generic.c:1129
last unlocked: /amd/halloran/r0/d2/NetBSD/src/sys/kern/kern_synch.c:744
switching with held simple_lock 0xf035a588 CPU 0
/amd/halloran/r0/d2/NetBSD/src/sys/kern/subr_pool.c:1292
simple_lock: uninitialized lock
lock: 0xf035a588, currently at:
/amd/halloran/r0/d2/NetBSD/src/sys/kern/subr_pool.c:935
on CPU 1
last locked: /amd/halloran/r0/d2/NetBSD/src/sys/kern/subr_pool.c:1292
last unlocked: /amd/halloran/r0/d2/NetBSD/src/sys/kern/subr_pool.c:1294
[...]
xcall(cpu0,0xf00087e4): couldn't ping cpus:panic: cpu1cpu1: stuck on
lock@f0329604
syncing disks...
simple_lock: locking against myself
lock: 0xf0326d24, currently at:
/amd/halloran/r0/d2/NetBSD/src/sys/kern/kern_synch.c:1237
on CPU 1
last locked: /amd/halloran/r0/d2/NetBSD/src/sys/kern/sys_generic.c:1129
last unlocked: /amd/halloran/r0/d2/NetBSD/src/sys/kern/kern_synch.c:744
pool_get(PR_WAITOK) with held simple_lock 0xf5702c68 CPU 1
/amd/halloran/r0/d2/NetBSD/src/sys/kern/tty.c:2487
[ last message repeated 107 times ]
[ system hung ]
Subsequent attempts to finish building userland have failed, but I
suspect
local filesystem corruption from the prior panics. It is being
restarted
from scratch.
------------------------
One thing I didn't mention in prior posts: Several hours prior to the
original panic, there were a number of console messages indicating a
fault in one of the memory modules. There was just one burst of them
and they've not reappeared since. Maybe this is just a symptom of a
memory module going bad?
Thanks.
--
John D. Baker NetBSD Darwin/MacOS X
http://mylinuxisp(dot)com/(tilde)jdbaker/ OpenBSD FreeBSD
BSD. It just sits there and _works_.
GPG fingerprint = D703 4A7E 479F 63F8 D3F4 BD99 9572 8F23 E4AD 1645