Subject: Re: Horrible RAIDFrame Crash
To: Caffeinate The World <mochaexpress@yahoo.com>
From: Greg Oster <oster@cs.usask.ca>
List: current-users
Date: 04/15/2003 13:35:38
Caffeinate The World writes:
>
> --- Caffeinate The World <mochaexpress@yahoo.com> wrote:
> >
> > --- Caffeinate The World <mochaexpress@yahoo.com> wrote:
> > I unplugged the SCSI connector from sd0 and booted the system up
> > again.
> > It booted up fine with the failed component errors. So sd1 is fine.
> >
> > What can I do to further narrow down the problem. Apparantly it's sd0
> > and it could be during the write process that caused the Multiple
> > disks
> > error. I get the feeling that if I repeat building sd0 as the spare,
> > I'll get the same errors.
>
> I unplugged the SCSI cable from sd0, boot up the system. Booted up
> fine. Shutdown to single user mode. Plug the SCSI cable back into sd0
> and "scsictl scsibus0 scan any any". It found sd0 fine.
>
> Tried to get sd0a to hotspare with raid0 again.
>
> raidctl -a /dev/sd0a raid0
> warning: truncating spare disk /dev/sd0a to 1023872 blocks
>
> NOTE: sd0a has the same layout and size as sd1a used by raid0. So that
> truncating error doesn't make sense.
What happens is that RAIDframe 'truncates' the component down to a multiple of
the stripe width. So it probably truncated the component on sd1 as well.
Not a problem.
> raidctl -vF component0 raid0
> started doing the reconstruction and was at 2% when
> ...fast scrolling errors... then
>
> recon read failed
> panic: raidframe error at line 1314 file
> /usr/src/sys/dev/raidframe/rf_reconstruct.c
This *is* an error in reading from a block on sd1. You might try
doing:
dd if=/dev/rsd1d of=/dev/null bs=1m
and see whether that errors out too. You might also check to see if
any of the logs in /var/log have a mention of a failing read.
> syncing disks... Multiple disks failed in a single group! Aborting I/O
> operation
Yup.. as far as RAIDframe is concerned it can't do anything with that RAID set
after another disk (the last and only disk, in this case) failed.
> Multiple disks failed...operation [repeated 17 times]
>
> panic raidframe error at line 471 file
> /usr/src/sys/dev/raidframe/rf_states.c
>
> P.S. I started the NetBSD nightmare thread FFS2, I guess this is the
> sequel: NetBSD Nightmare II. :(
No... It's most likely "The Hardware Nightmare" :(
[I'll answer your other postings later this evening...]
Later...
Greg Oster