Subject: kern/23372: mlxctl can panic NetBSD-1.6.1_STABLE/alpha
To: None <gnats-bugs@gnats.netbsd.org>
From: Greg A. Woods <woods@weird.com>
List: netbsd-bugs
Date: 11/04/2003 20:16:03
>Number: 23372
>Category: kern
>Synopsis: mlxctl can panic NetBSD-1.6.1_STABLE/alpha
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Nov 05 01:17:00 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator: Greg A. Woods
>Release: NetBSD 1.6.1_STABLE
>Organization:
Planix, Inc.; Toronto, Ontario; Canada
>Environment:
System: NetBSD proven 1.6.1_STABLE
Architecture: alpha
Machine: alpha
>Description:
Something odd happened with the Mylex RAID volumes when I
rebooted my alpha today (after trying to boot a CD that
apparently isn't bootable on an alpha).
While attempting to find out what was happening I tried checking
the controller status with "mlxctl", but the system paniced:
I don't believe I've tried running "mlxctl" ever before....
Note I've used two of the logical volumes (ld1 & ld2)
extensively, one as /var/obj for a full system build and the
other to store some bulk data.
>How-To-Repeat:
NetBSD 1.6.2_RC1 (BUILDING) #12: Sat Nov 1 15:54:03 EST 2003
woods@building:/var/obj/BUILDING
AlphaServer 4000 5/400 4MB, 400MHz, s/n NI64906T1N
8192 byte page size, 2 processors.
total memory = 1536 MB
(2080 KB reserved for PROM, 1533 MB used by NetBSD)
avail memory = 1401 MB
using 9830 buffers containing 78640 KB of memory
mainbus0 (root)
cpu0 at mainbus0: ID 0 (primary), 21164A-1
cpu0: VAX FP support, IEEE FP support, Primary Eligible
cpu0: Architecture extensions: 1<BWX>
cpu1 at mainbus0: ID 1, 21164A-2
cpu1: VAX FP support, IEEE FP support
cpu1: processor off-line; multiprocessor support not present in kernel
mcbus0 at mainbus0: 4MB BCache
mcmem0 at mcbus0 mid 1: Memory
mcpcia0 at mcbus0 mid 5: PCI Bridge
mcpcia0: Horse Revision 3, Left Handed Saddle Revision 3, CAP Revision 2
pci0 at mcpcia0 bus 0
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
siop0 at pci0 dev 1 function 0: Symbios Logic 53c810 (fast scsi)
siop0: interrupting at kn300 irq 36
scsibus0 at siop0: 8 targets, 8 luns per target
ppb0 at pci0 dev 2 function 0: Digital Equipment DECchip 21050 PCI-PCI Bridge (rev. 0x02)
pci1 at ppb0 bus 2
pci1: i/o space, memory space enabled, rd/line, wr/inv ok
isp0 at pci1 dev 0 function 0: QLogic 1020 Ultra Wide SCSI HBA
isp0: interrupting at kn300 irq 40
scsibus1 at isp0: 16 targets, 8 luns per target
mlx0 at pci0 dev 3 function 0: Mylex RAID (v2 interface)
mlx0: interrupting at kn300 irq 44
mlx0: DAC960P/PD, 3 channels, firmware 2.49-0-00, 32MB RAM
ld0 at mlx0 unit 0: RAID6, online
ld0: 8182 MB, 4155 cyl, 64 head, 63 sec, 512 bytes/sect x 16756736 sectors
ld1 at mlx0 unit 1: RAID6, offline
ld1: disabled
ld2 at mlx0 unit 2: RAID5, offline
ld2: disabled
mcpcia1 at mcbus0 mid 4: PCI Bridge
mcpcia1: Horse Revision 3, Left Handed Saddle Revision 3, CAP Revision 2
pci2 at mcpcia1 bus 0
pci2: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pceb0 at pci2 dev 1 function 0: Intel 82375EB/SB PCI-EISA Bridge (PCEB) (rev. 0x05)
vga0 at pci2 dev 2 function 0: S3 Trio32/64 (rev. 0x00)
pci_mem_find: void region
pci_mem_find: void region
pci_mem_find: void region
pci_mem_find: void region
pci_mem_find: void region
wsdisplay0 at vga0 (kbdmux ignored)
tlp0 at pci2 dev 3 function 0: DECchip 21140 Ethernet, pass 1.2
tlp0: broken MicroWire interface detected; setting SROM size to 1Kb
tlp0: interrupting at kn300 irq 12
tlp0: DEC DE500-XA, Ethernet address 00:00:f8:1e:38:a7
tlp0: 10baseT, 100baseTX, 100baseTX-FDX, 10baseT-FDX
fpa0 at pci2 dev 4 function 0: DEC DEFPA PCI FDDI SAS Controller
fpa0: FDDI address 08:00:2b:b7:68:e8, FW=3.20, HW=1, SMT V7.2
fpa0: FDDI Port = S (PMD = ANSI Multi-Mode)
fpa0: interrupting at kn300 irq 16
eisa0 at pceb0
isa0 at pceb0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0 (mux ignored)
pms0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pms0 (mux ignored)
lpt0 at isa0 port 0x3bc-0x3bf irq 7
pcppi0 at isa0 port 0x61
midi0 at pcppi0: PC speaker
spkr0 at pcppi0
isabeep0 at pcppi0
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
mcclock0 at isa0 port 0x70-0x71: mc146818 or compatible
stray kn300 irq 40
scsibus0: waiting 2 seconds for devices to settle...
siop0: alloc newcdb at PHY addr 0x887d4000
st0 at scsibus0 target 0 lun 0: <DEC, TLZ09 (C)DEC, 0167> SCSI2 1/sequential removable
st0: drive empty
st0: sync (100.0ns offset 8), 8-bit (10.000MB/s) transfers
cd0 at scsibus0 target 5 lun 0: <DEC, RRD45 (C) DEC, 0436> SCSI2 5/cdrom removable
cd0: async, 8-bit transfers
scsibus1: waiting 2 seconds for devices to settle...
stray kn300 irq 40
sd0 at scsibus1 target 0 lun 0: <DEC, RZ29B (C) DEC, 0016> SCSI2 0/direct fixed
sd0: 4091 MB, 3708 cyl, 20 head, 113 sec, 512 bytes/sect x 8380080 sectors
stray kn300 irq 40
sd0: sync (100.0ns offset 12), 16-bit (20.000MB/s) transfers, tagged queueing
stray kn300 irq 40
raidattach: Asked for 8 units
Kernel internal RAIDframe activated
RAIDframe: Searching for raid components...
IPsec: Initialized Security Association Processing.
root on sd0a dumps on sd0b
mlx0: unit 1 offline
mountroot: trying nfs...
mountroot: trying msdos...
mountroot: trying cd9660...
mountroot: trying ffs...
readclock: 3/11/5/0/18/23=>1067991503 (1067986000)
root file system type: ffs
init: copying out path `/sbin/init' 11
mlx0: unit 0 online
mlx0: unit 1 offline
Type a quit character (usually ^\) to abort multi-user startup.
Tue Nov 4 19:18:25 EST 2003
swapctl: adding /dev/sd0b as swap device at priority 0
Starting file system checks:
/dev/rsd0a: file system is clean; not checking
/dev/rsd0d: file system is clean; not checking
/dev/rld0a: file system is clean; not checking
Can't open /dev/rld1a: Operation not supported by device
CAN'T CHECK FILE SYSTEM.
/dev/rld1a: UNEXPECTED INCONSISTENCY; RUN fsck_ffs MANUALLY.
Can't open /dev/rld2a: Operation not supported by device
CAN'T CHECK FILE SYSTEM.
/dev/rld2a: UNEXPECTED INCONSISTENCY; RUN fsck_ffs MANUALLY.
/dev/rld0d: file system is clean; not checking
/dev/rld0e: file system is clean; not checking
THE FOLLOWING FILE SYSTEMS HAD AN UNEXPECTED INCONSISTENCY:
ffs: /dev/rld1a (/build), ffs: /dev/rld2a (/mfbd)
Automatic file system check failed; help!
N O T I C E : Please do not use the console except to run shutdown!
We recommend creating a non-root account and using su(1) for root access.
Terminal type is vt100.
chmod: /tmp: Read-only file system
We recommend creating a non-root account and using su(1) for root access.
[console]<@> # mlx0: unit 0 online
mlx0: unit 1 offline
mlx0: unit 0 online
mlx0: unit 1 offline
mlx0: unit 0 online
mlx0: unit 1 offline
mlx0: unit 0 online
mlx0: unit 1 offline
mlx0: unit 0 online
mlx0: unit 1 offline
mlx0: unit 0 online
mlx0: unit 1 offline
mlx0: unit 0 online
mlx0: unit 1 offline
# mlxctl -a -v cstatus
DAC960P/PD, 3 chmlx_user_command: mlx_ccb_alloc = 35
annels, firmware
2.49-0-00, 32MBCPU 0: fatal kernel trap:
RAM
CPU 0 trap entry = 0x2 (memory management fault)
CPU 0 a0 = 0x14
CPU 0 a1 = 0x1
CPU 0 a2 = 0x0
CPU 0 pc = 0xfffffc0000344958
CPU 0 ra = 0xfffffc0000344750
CPU 0 pv = 0xfffffc00004667e0
CPU 0 curproc = 0xfffffc00088ffcc0
CPU 0 pid = 57, comm = mlxctl
panic: trap
Stopped in pid 57 (mlxctl) at cpu_Debugger+0x4: ret zero,(ra)
db> trace
cpu_Debugger() at cpu_Debugger+0x4
panic() at panic+0x168
trap() at trap+0x5fc
XentMM() at XentMM+0x20
--- memory management fault (from ipl 0) ---
mlx_user_command() at mlx_user_command+0x3d8
mlxioctl() at mlxioctl+0x2bc
spec_ioctl() at spec_ioctl+0x7c
vn_ioctl() at vn_ioctl+0x154
sys_ioctl() at sys_ioctl+0x4ec
syscall_plain() at syscall_plain+0x154
XentSys() at XentSys+0x58
--- syscall (54) ---
--- user mode ---
db>
The other day when I booted it up everything seemed fine:
[Sat Nov 1 16:09:28 2003]mlx0 at pci0 dev 3 function 0: Mylex RAID (v2 interface)
[Sat Nov 1 16:09:28 2003]mlx0: interrupting at kn300 irq 44
[Sat Nov 1 16:09:28 2003]mlx0: DAC960P/PD, 3 channels, firmware 2.49-0-00, 32MB RAM
[Sat Nov 1 16:09:28 2003]ld0 at mlx0 unit 0: RAID6, online
[Sat Nov 1 16:09:28 2003]ld0: 8182 MB, 4155 cyl, 64 head, 63 sec, 512 bytes/sect x 16756736 sectors
[Sat Nov 1 16:09:28 2003]ld1 at mlx0 unit 1: RAID6, online
[Sat Nov 1 16:09:28 2003]ld1: 8182 MB, 4155 cyl, 64 head, 63 sec, 512 bytes/sect x 16756736 sectors
[Sat Nov 1 16:09:28 2003]ld2 at mlx0 unit 2: RAID5, online
[Sat Nov 1 16:09:28 2003]ld2: 28637 MB, 7272 cyl, 128 head, 63 sec, 512 bytes/sect x 58648576 sectors
>Fix:
unknown
>Release-Note:
>Audit-Trail:
>Unformatted: