Subject: port-alpha/23588: alpha SMP kernel dies horribly after completing autoconfig
To: None <gnats-bugs@gnats.netbsd.org>
From: None <he@netbsd.org>
List: netbsd-bugs
Date: 11/28/2003 11:52:45
>Number: 23588
>Category: port-alpha
>Synopsis: alpha SMP kernel dies horribly after completing autoconfig
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: port-alpha-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Nov 28 10:53:00 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator: Havard Eidnes
>Release: NetBSD 1.6ZF Nov 28 2003
>Organization:
Unorganized, Inc.
>Environment:
System: NetBSD kveite.urc.uninett.no 1.6ZF NetBSD 1.6ZF (CS20.MP) #8: Fri Nov 14 14:14:17 CET 2003 he@kveite.urc.uninett.no:/usr/obj/sys/arch/alpha/compile/CS20.MP alpha
Architecture: alpha
Machine: alpha
>Description:
I just updated my sources, did an update build of the tools,
and built a new kernel for my CS20 from scratch.
The kernel completes the autoconfig phase, but either jumps
into nowhere-land (does not respond to BREAK on the console)
or dies horribly after init is started. I have so far
observed three different failure modes:
1) it drops back to SRM:
root on sd0a dumps on sd0b
root file system type: ffs
halted CPU 0
CPU 1 is not halted
halt code = 4
invalid PTBR
PC = fffffc00004f2a30
P00>>>
2) it appears to get stuck (DDB does not respond to BREAK):
root on sd0a dumps on sd0b
root file system type: ffs
Fri Nov 28 10:15:30 GMT 2003
[BREAK][BREAK]
3) it gets a fatal kernel trap followed by what seemed like an
endless loop of fatal kernel panics:
root on sd0a dumps on sd0b
root file system type: ffs
Fri Nov 28 10:17:55 GMT 2003
CPU 1: fatal kernel trap:
CPU 1 trap entry = 0x4 (unaligned access fault)
CPU 1 a0 = 0xfffffc0000420adc
CPU 1 a1 = 0x29
CPU 1 a2 = 0x2
CPU 1 pc = 0xfffffc00004fe7fc
CPU 1 ra = 0xfffffc00003004b8
CPU 1 pv = 0xfffffc00004fdf00
CPU 1 curlwp = 0xfffffc0000420a04
CPU 1: fatal kernel trap:
CPU 1 trap entry = 0x4 (unaligned access fault)
CPU 1 a0 = 0xfffffc0000420a34
CPU 1 a1 = 0x29
CPU 1 a2 = 0x12
CPU 1 pc = 0xfffffc00004fde08
CPU 1 ra = 0xfffffc00004fdde0
CPU 1 pv = 0xfffffc0000443250
CPU 1 curlwp = 0xfffffc0000420a04
panic: alpha_send_ipi: bogus cpu_id
Begin traceback...
CPU 1: fatal kernel trap:
CPU 1 trap entry = 0x4 (unaligned access fault)
CPU 1 a0 = 0xfffffc0000420a34
CPU 1 a1 = 0x29
CPU 1 a2 = 0x12
CPU 1 pc = 0xfffffc00004fde08
CPU 1 ra = 0xfffffc00004fdde0
CPU 1 pv = 0x0
CPU 1 curlwp = 0xfffffc0000420a04
alpha trace requires known PC =eject=
End traceback...
etc. etc. etc.
The difference between 3) and 2) is that I in 3) pressed ENTER
on the console (and got it echoed before the ream of trap/
panic messages).
Dmesg output for "last good" and "this" kernel will be
appended to this PR after initial submission.
>How-To-Repeat:
Update to today's -current on an SMP alpha system. Watch it
behave as one of the above.
>Fix:
Don't know, but something changed between Nov 14 and Nov 28
that has caused this bug.
>Release-Note:
>Audit-Trail:
>Unformatted: