Subject: kern/21900: Compaq Smart Array Kernel Panic
To: None <gnats-bugs@gnats.netbsd.org>
From: None <root@amalgam.dyndns.org>
List: netbsd-bugs
Date: 06/16/2003 16:18:37
>Number: 21900
>Category: kern
>Synopsis: Compaq Smart Array Kernel Panic
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Jun 16 07:19:00 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator: Charlie Root
>Release: NetBSD 1.6Q (Built 04-03-2003)
>Organization:
>Environment:
System: NetBSD orion 1.6Q NetBSD 1.6Q (GENERIC) #0: Wed Apr 2 17:44:22 JST 2003 root@orion:/usr/local/netbsd-current/obj/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:
I have been receiving "biodone already" and "disk_unbusy" kernel panics
pretty regularly on several boxes that I have attempted to install NetBSD
1.6.1 & 1.6.1T on to. The kernel panics come during heavy disk read/write
access times. Example crashes follow.
This report was generated on another machine as the boxes with the problem
are too unstable to be used to create a report.
ld0f: error writing fsbn 90048 of 90048-90063 (ld0 bn 5597760; cn 1388 tn 21 sn 21)
ld0f: error writing fsbn 90048 of 90048-90063 (ld0 bn 5597760; cn 1388 tn 21 sn 21)
ld0: dk_busy < 0
panic: disk_unbusy
stopped in pid 227 (tar) at cpu_debugger+0x4: leave
stopped in pid 227 (tar) at cpu_debugger+0x5: ret
stopped in pid 227 (tar) at panic+0xad: jmp panic+0x118
stopped in pid 227 (tar) at panic+0x118: addl $-0x8,%esp
stopped in pid 227 (tar) at panic+0x11b: pushl $0
Hardware
Server 1:
Compaq Proliant 1850R (PIII 600) 128MB RAM
Compaq Smart Array 3200
4 x 9.1 GB Ultra2 SCSI HD (Tried RAID 0+1 and also RAID 5)
Server 2:
Compaq Proliant 1600 (PII 450) 128MB RAM
Compaq Smart Array 2/SL
5 x 4.3 GB Ultra2 SCSI HD (Both RAID 0+1 and RAID 5 have been tried)
Server 3:
Compaq Proliant 2500 (PPro 200) 256MB RAM
Compaq Smart Array 2/DH
3 x 4.3 GB Ultra2 SCSI HD (RAID 1 with hot spare)
I have tried these servers with
-1.6.1 and current.
-With Array acceleration enabled and disabled
Ten-finger copy of trace and PS output after one crash:
panic: biodone already
Stopped at cpu_Debugger+0x4: leave
db> trace
cpu_Debugger(c4aa9488,6,ca9a2e40,c017b4f6,c3aa9488) at cpu_Debugger+0x4
panic(c0546622,c0a26200,c0a262b0,c0793ddc,c3aa9488) at panic+0xb8
biodone(c3aa9488,2000,100000,c0793ddc,c0a26200) at biodone+0x35
ddoneac3aa9488,c0793e08,c01b2c9c,c3aa9488) at lddone+0x05
ld_cac_done(c0a26200,c3aa948,8,0,c01b2b2a,c09dda00) at ld_cac_done+0xc5
cac_ccb_done(c09dda00,ca9a2e40,c0793e68,0,c0a23e40) at cac_ccb_done+0x9f
cac_intr(c09dda00,0,c0790010,30,c0100010) at cac_intr+0x2a
Xintr_legacy10() at Xintr_legacy10+0xa8
--- interrupt ---
mpidle(c06d9560,0,c0793f6c,0,80000000) at mpidle
ltsleep(c06d93a0,4,c054de46,0,0) at ltsleep+0x207
gvm_scheduler(c078f010,78f000,798000,0,0) at gvm_scheduler+0x75
main(0,0,0,0,0) at main+0x69e
db> ps
PID PPID PGRP UID S FLAGS LWPS COMMAND WAIT
447 446 446 0 2 0x4002 1 gzip pipdwt
446 364 446 0 2 0x4002 1 tar biowait
380 409 409 0 2 0x4002 1 gzip pipdwt
409 342 409 0 2 0x4002 1 tar biowait
405 368 405 0 2 0x4002 1 rm biowait
375 1 375 0 2 0x4002 1 getty ttyin
364 1 364 0 2 0x4003 1 csh pause
342 1 342 0 2 0x4003 1 csh pause
368 1 368 0 2 0x4003 1 csh pause
344 1 344 0 2 0 1 cron nanosic
334 1 334 0 2 0 1 inetd kqread
175 1 175 0 2 0 1 syslogd biowait
125 1 125 0 2 0 1 dhclient select
10 0 0 0 2 0x20200 1 aiodoned aiodone
9 0 0 0 2 0x20200 1 ioflush
8 0 0 0 2 0x20200 1 reaper reaper
7 0 0 0 2 0x20200 1 pagedaemon pgdaemo
6 0 0 0 2 0x20200 1 ifs_writer ifswrit
5 0 0 0 2 0x20200 1 pms0 pmsrese
4 0 0 0 2 0x20200 1 atapibus0 sccomp
3 0 0 0 2 0x20200 1 scsibus1 sccomp
1 0 1 0 2 0x4000 1 init wait
0 -1 0 0 2 0x20200 1 swapper schedule
db>
>How-To-Repeat:
Heavy disk write activity on a system using a Compaq Smart Array 2 SL,
DH, or 3200 seems to be all that is necessary to induce the kernel
panic.
>Fix:
No known work-around.
>Release-Note:
>Audit-Trail:
>Unformatted: