Subject: kernel panics: lock error (mutex)
To: None <current-users@netbsd.org>
From: Arto Selonen <arto@selonen.org>
List: current-users
Date: 12/13/2007 14:34:10
Hi!
Recently our NetBSD-current system (i386) has become fairly unstable (from
uptimes of months, interrupted by current upgrades, down to kernel panics
every few days). I'm still tracking down the upgrade after which the
panics appeared/increased, as I'm not sure that this is merely something
introduced recently, but rather feel that some recent change has started
to trigger this more frequently. Specifics pending...
Anyway, for roughly the past month or so, the system has crashed with
kernel panics, appearing as fast as one day after a reboot, up to running
about a week or so. Most of the panics have been mutex-related, but I
haven't written down details as I've expected it to be some transient
problem with current, that goes away in a future upgrade, and so have
simply grabbed the latest sources, upgraded and tried again. Here is the
latest panic report, copied from screen (no serial console), that
appeared while building yet another upgrade from a couple of days ago:
(this one was on a 3.99.39)
Mutex error: mutex_spin_retry: locking against myself
lock address: .....
current cpu : 0
current lwp : .....
owner field : ..... wait/spin 0/1
panic: lock error
Stopped in pid 272.1 (squid)
And here is just the function name trace, in case it might be enough to
give you ideas what could be causing this:
db> tr
breakpoint
lockdebug_abort
mutex_abort
mutex_owner
cv_timedwait_sig
pollcommon
sys_poll
syscall
db> reboot 0x104
So, I have a crash dump available. Should I file a PR and continue from
there with more data (like kernel config, etc), or is there something
simple that I might have missed, regarding updating kernel config, or
something that a normal simple current upgrade would miss (cvs
update, build tools, build kernel, build world, boot kernel, install
world) ?
Any suggestions for disabling/enabling specific debugging or similar
kernel config options for tracking this down? (Already should have most
enabled from previous problems in the past couple of years).
Would like to see the system stability increased before the longish
holiday season. ;-) (It's a firewall/gateway/web proxy).
All comments welcome. :)
Artsi
--
#######======------ http://www.selonen.org/arto/ --------========########
Everstinkuja 5 B 35 Don't mind doing it.
FI-02600 Espoo arto@selonen.org Don't mind not doing it.
Finland tel +358 50 560 4826 Don't know anything about it.