Subject: sparc64 / 2.0.1 and thread crashes (was: Re: Ultra 5 / 2.0 / panic: lockmgr: no context)
To: None <port-sparc64@netbsd.org>
From: Gert Doering <gert@greenie.muc.de>
List: port-sparc64
Date: 02/03/2005 15:12:13
Hi,
you might remember this thread...
On Tue, Jan 11, 2005 at 09:07:42AM +0100, Gert Doering wrote:
> all of a sudden, my Ultra 5 with NetBSD 2.0 on it is acting up.
>
> Every night between 2 and 5 a.m. - usually at a time with *NO* major
> activity going on at all (!) - the machine dies.
[..]
> Before this problem started, I had done some major pkg updates - perl 5.6
> and spamassassin 2.64 to perl 5.8.6 and spamassassin 3.0.1. Also this
> machine is doing some more UUCP spooling activity nowadays (but still
> very light load only).
>
> This is NetBSD 2.0 (from CVS), Sparc64, Ultra 5.
As the machine crashes only once per night, it has been somewhat difficult
to track it down to "bad hardware" or "other issues". Especially since
it *didn't* crash in some nights, leaving me wondering "did I find it,
or did it just not crash?"...
I've now tried re-seating and swapping all central components...
re-seat CPU module
swap CPU module (400 MHz -> 360 MHz module)
swap power supply
re-seated DRAM modules, swapping slot 0+1 vs. 2+3
nothing had any significant effect.
Today I decided "let it run with only half of the DRAM SIMMs, maybe one
half is bad, and the other one is good".
Surprise! Now the machine crashes after about 2-3 hours - regardless
*which* SIMMs are used, so it's not a "single bad SIMM".
With all 4 SIMMs the machine has 512 Mb DRAM, and is hardly ever swapping
- with only 2 SIMMs, it has 256 Mb DRAM, and since it's running SpamAssassin,
it *will* swap.
*Bingo*!
Before the machine started acting up, I did upgrade *PERL* from 5.6.x to
5.8.6nb1, which is built *threaded by default*. Always.
... and "as per the other thread", Sparc64 will crash if threaded apps
will start swapping.
I've seen a patch on the list (Chuck Silvers, 01/26/2005) that's supposed
to cure Sparc/Sparc64+threads+swapping crashes. I'm not sure whether it's
supposed to work on 2.0.1, though - it'll definitely need some manual
adjustments.
What would be your recommendation to tackle this issue?
- rebuild perl without threads?
- patch the kernel with Chuck's patch?
- run without Swap unless this is integrated in the "normal" 2.0.1
source tree?
I'm a bit unsure what might be the best way to tackle this now...
gert
--
USENET is *not* the non-clickable part of WWW!
//www.muc.de/~gert/
Gert Doering - Munich, Germany gert@greenie.muc.de
fax: +49-89-35655025 gert@net.informatik.tu-muenchen.de