Port-sparc64 archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: SIR Reset Watchdog Reset
On Fri, 12 Mar 2010, Jochen Kunz wrote:
> I have:
> NetBSD 5.0_STABLE (GENERIC.MP) #0: Fri Nov 27 10:40:10 CET 2009
>
> jkunz@MissSophie:/datengrab/src/NetBSD/release-5/objdir/sparc64/sys/arch
> /sparc64/compile/GENERIC.MP
> total memory = 1024 MB
> avail memory = 992 MB
> timecounter: Timecounters tick every 10.000 msec
> mainbus0 (root): SUNW,Ultra-60 (Netra t 1120/1125): hostid 12345678
> cpu0 at mainbus0: SUNW,UltraSPARC-II @ 400.004 MHz, UPA id 0
> cpu0: 32K instruction (32 b/l), 16K data (32 b/l), 4096K external (64 b/l)
> cpu1 at mainbus0: SUNW,UltraSPARC-II @ 400.004 MHz, UPA id 2
> cpu1: 32K instruction (32 b/l), 16K data (32 b/l), 4096K external (64 b/l)
>
> This machine crashes about once a week with:
> NetBSD/sparc64 (Maja) (console)
>
> login:
> SIR Reset
>
> Watchdog Reset
> Externally Initiated Reset
> {2} ok
>
> The machine serves as a NFS and NIS server with nearly no load.
> / and /usr are on a pair of RAIDframe mirrored disks, /home is an
> external hardware RAID5. Filesystems are mounted with option "log".
>
> Software bug or broken hardware?
Dunno.
SIR is a software initiated reset. locore.s has them sprinkled around in
some places where the kernel gets so stuffed up it can't recover. Once
you get one of those you should do:
ok .trap-registers
ok .registers
ok ctrace
ok 0 .window
ok 1 .window
ok 2 .window
until you get to
ok 7 .window
The most important information is .trap-registers and .registers. You
need to correlate the tpc and tnpc addresses with specific locations in
the specific kernel you're running to determine what sequence of traps got
you in the position to execute the sir instruction. If you can figure out
which specific sir instruction you hit that often gives you enough
intformaton to figure out why the kernel took a dive.
Eduardo
Home |
Main Index |
Thread Index |
Old Index