Subject: raidframe and pciide list interrupts
To: None <current-users@netbsd.org>
From: Simon Burge <simonb@wasabisystems.com>
List: current-users
Date: 11/22/2000 04:05:37
Hi,
One half of my raidframe mirror across wd0 and wd1 (a pair of IBM 46GB
disks) on my Alpha PC164 running 1.5_BETA2 just died with:
Nov 22 02:44:32 thoreau /netbsd: pciide0:0:0: lost interrupt
Nov 22 03:10:30 thoreau /netbsd: type: ata tc_bcount: 8192 tc_skip: 0
Nov 22 03:10:30 thoreau /netbsd: pciide0:0:0: bus-master DMA error: missing interrupt, status=0x21
Nov 22 03:10:30 thoreau /netbsd: pciide0:0:0: device timeout, c_bcount=8192, c_skip0
Nov 22 03:10:30 thoreau /netbsd: pciide0 channel 0: reset failed for drive 0
Nov 22 03:10:30 thoreau /netbsd: wd0a: device timeout reading fsbn 5275776 of 5275776-5275791
(wd0 bn 5275776; cn 5233 tn 14 sn 30), retrying
Nov 22 03:10:30 thoreau /netbsd: pciide0:0:0: not ready, st=0x80, err=0x00
Nov 22 03:10:30 thoreau /netbsd: pciide0 channel 0: reset failed for drive 0
Nov 22 03:10:30 thoreau /netbsd: wd0a: device timeout reading fsbn 5275776 of 5275776-5275791
(wd0 bn 5275776; cn 5233 tn 14 sn 30), retrying
Nov 22 03:10:30 thoreau /netbsd: pciide0:0:0: not ready, st=0x80, err=0x00
Nov 22 03:10:30 thoreau /netbsd: pciide0 channel 0: reset failed for drive 0
Nov 22 03:10:30 thoreau /netbsd: wd0a: device timeout reading fsbn 5275776 of 5275776-5275791
(wd0 bn 5275776; cn 5233 tn 14 sn 30), retrying
Nov 22 03:10:30 thoreau /netbsd: pciide0:0:0: not ready, st=0x80, err=0x00
Nov 22 03:10:30 thoreau /netbsd: pciide0 channel 0: reset failed for drive 0
Nov 22 03:10:30 thoreau /netbsd: wd0a: device timeout reading fsbn 5275776 of 5275776-5275791
(wd0 bn 5275776; cn 5233 tn 14 sn 30), retrying
Nov 22 03:10:30 thoreau /netbsd: pciide0:0:0: not ready, st=0x80, err=0x00
Nov 22 03:10:30 thoreau /netbsd: pciide0 channel 0: reset failed for drive 0
Nov 22 03:10:30 thoreau /netbsd: wd0a: device timeout reading fsbn 5275776 of 5275776-5275791
(wd0 bn 5275776; cn 5233 tn 14 sn 30), retrying
Nov 22 03:10:30 thoreau /netbsd: pciide0:0:0: not ready, st=0x80, err=0x00
Nov 22 03:10:30 thoreau /netbsd: pciide0 channel 0: reset failed for drive 0
Nov 22 03:10:30 thoreau /netbsd: wd0a: device timeout reading fsbn 5275776 of 5275776-5275791
(wd0 bn 5275776; cn 5233 tn 14 sn 30)
Nov 22 03:10:30 thoreau /netbsd: raid0: IO Error. Marking /dev/wd0a as failed.
Nov 22 03:10:30 thoreau /netbsd: raid0: node (Rmir) returned fail, rolling backward
This continued for about 10 minutes with lots of pciide and wd0 errors
interspersed with the following raidframe errors:
Nov 22 03:10:30 thoreau /netbsd: raid0: IO Error. Marking /dev/wd0a as failed.
Nov 22 03:10:30 thoreau /netbsd: raid0: node (Rmir) returned fail, rolling backward
Nov 22 03:10:30 thoreau /netbsd: raid0: DAG failure: r addr 0x508040 (5275712) nblk 0x10 (16) buf 0xfffffe000307c000
Nov 22 03:11:22 thoreau /netbsd: raid0: node (Wpd) returned fail, rolling forward
Nov 22 03:12:14 thoreau /netbsd: raid0: node (Wpd) returned fail, rolling forward
Nov 22 03:12:14 thoreau /netbsd: raid0: node (Wpd) returned fail, rolling forward
Nov 22 03:15:41 thoreau /netbsd: raid0: node (Wpd) returned fail, rolling forward
Nov 22 03:20:01 thoreau /netbsd: raid0: node (Wpd) returned fail, rolling forward
and now seems to be ignoring wd0 altogether.
So, a couple of questions:
1) Shouldn't raidframe have stopped accessing wd0 after the first
"Marking /dev/wd0a as failed"?
2) Is the disk hosed? Sleep time now - I'll reboot in the morning and
see what happens.
Simon.
--
Simon Burge <simonb@wasabisystems.com>
NetBSD Sales, Support and Service: http://www.wasabisystems.com/