Subject: Re: raidframe and pciide list interrupts
To: Simon Burge <simonb@wasabisystems.com>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: current-users
Date: 11/21/2000 21:06:33
On Wed, Nov 22, 2000 at 04:05:37AM +1100, Simon Burge wrote:
> Hi,
> 
> One half of my raidframe mirror across wd0 and wd1 (a pair of IBM 46GB
> disks) on my Alpha PC164 running 1.5_BETA2 just died with:
> 
> Nov 22 02:44:32 thoreau /netbsd: pciide0:0:0: lost interrupt
> Nov 22 03:10:30 thoreau /netbsd:        type: ata tc_bcount: 8192 tc_skip: 0
> Nov 22 03:10:30 thoreau /netbsd: pciide0:0:0: bus-master DMA error: missing interrupt, status=0x21
> Nov 22 03:10:30 thoreau /netbsd: pciide0:0:0: device timeout, c_bcount=8192, c_skip0
> Nov 22 03:10:30 thoreau /netbsd: pciide0 channel 0: reset failed for drive 0
> Nov 22 03:10:30 thoreau /netbsd: wd0a: device timeout reading fsbn 5275776 of 5275776-5275791
> 				(wd0 bn 5275776; cn 5233 tn 14 sn 30), retrying
> Nov 22 03:10:30 thoreau /netbsd: pciide0:0:0: not ready, st=0x80, err=0x00
> Nov 22 03:10:30 thoreau /netbsd: pciide0 channel 0: reset failed for drive 0
> Nov 22 03:10:30 thoreau /netbsd: wd0a: device timeout reading fsbn 5275776 of 5275776-5275791
> 				(wd0 bn 5275776; cn 5233 tn 14 sn 30), retrying
> Nov 22 03:10:30 thoreau /netbsd: pciide0:0:0: not ready, st=0x80, err=0x00
> Nov 22 03:10:30 thoreau /netbsd: pciide0 channel 0: reset failed for drive 0
> Nov 22 03:10:30 thoreau /netbsd: wd0a: device timeout reading fsbn 5275776 of 5275776-5275791
> 				(wd0 bn 5275776; cn 5233 tn 14 sn 30), retrying
> Nov 22 03:10:30 thoreau /netbsd: pciide0:0:0: not ready, st=0x80, err=0x00
> Nov 22 03:10:30 thoreau /netbsd: pciide0 channel 0: reset failed for drive 0
> Nov 22 03:10:30 thoreau /netbsd: wd0a: device timeout reading fsbn 5275776 of 5275776-5275791
> 				(wd0 bn 5275776; cn 5233 tn 14 sn 30), retrying
> Nov 22 03:10:30 thoreau /netbsd: pciide0:0:0: not ready, st=0x80, err=0x00
> Nov 22 03:10:30 thoreau /netbsd: pciide0 channel 0: reset failed for drive 0
> Nov 22 03:10:30 thoreau /netbsd: wd0a: device timeout reading fsbn 5275776 of 5275776-5275791
> 				(wd0 bn 5275776; cn 5233 tn 14 sn 30), retrying
> Nov 22 03:10:30 thoreau /netbsd: pciide0:0:0: not ready, st=0x80, err=0x00
> Nov 22 03:10:30 thoreau /netbsd: pciide0 channel 0: reset failed for drive 0
> Nov 22 03:10:30 thoreau /netbsd: wd0a: device timeout reading fsbn 5275776 of 5275776-5275791
> 				(wd0 bn 5275776; cn 5233 tn 14 sn 30)
> Nov 22 03:10:30 thoreau /netbsd: raid0: IO Error.  Marking /dev/wd0a as failed.
> Nov 22 03:10:30 thoreau /netbsd: raid0: node (Rmir) returned fail, rolling backward
> 
> This continued for about 10 minutes with lots of pciide and wd0 errors
> interspersed with the following raidframe errors:
> 
> Nov 22 03:10:30 thoreau /netbsd: raid0: IO Error.  Marking /dev/wd0a as failed.
> Nov 22 03:10:30 thoreau /netbsd: raid0: node (Rmir) returned fail, rolling backward
> Nov 22 03:10:30 thoreau /netbsd: raid0: DAG failure: r addr 0x508040 (5275712) nblk 0x10 (16) buf 0xfffffe000307c000
> Nov 22 03:11:22 thoreau /netbsd: raid0: node (Wpd) returned fail, rolling forward
> Nov 22 03:12:14 thoreau /netbsd: raid0: node (Wpd) returned fail, rolling forward
> Nov 22 03:12:14 thoreau /netbsd: raid0: node (Wpd) returned fail, rolling forward
> Nov 22 03:15:41 thoreau /netbsd: raid0: node (Wpd) returned fail, rolling forward
> Nov 22 03:20:01 thoreau /netbsd: raid0: node (Wpd) returned fail, rolling forward
> 
> and now seems to be ignoring wd0 altogether.
> 
> So, a couple of questions:
> 
>  1) Shouldn't raidframe have stopped accessing wd0 after the first
>     "Marking /dev/wd0a as failed"?
> 
>  2) Is the disk hosed?  Sleep time now - I'll reboot in the morning and
>     see what happens.

The "reset failed" messages don't sound good. The drive stopped is no longer
on the bus at all. I'd say its IDE interface is dead. Or maybe it's just the
cable ?

--
Manuel Bouyer <bouyer@antioche.eu.org>
--