NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: kern/40569: Faild RAIDframe parity rewrite prevents system shutdown
The following reply was made to PR kern/40569; it has been noted by GNATS.
From: Greg Oster <oster%cs.usask.ca@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc:
Subject: Re: kern/40569: Faild RAIDframe parity rewrite prevents system
shutdown
Date: Fri, 06 Feb 2009 18:47:48 -0600
tron%zhadum.org.uk@localhost writes:
> >Number: 40569
> >Category: kern
> >Synopsis: Faild RAIDframe parity rewrite prevents system shutdown
> >Confidential: no
> >Severity: serious
> >Priority: medium
> >Responsible: kern-bug-people
> >State: open
> >Class: sw-bug
> >Submitter-Id: net
> >Arrival-Date: Fri Feb 06 23:05:00 +0000 2009
> >Originator: Matthias Scheler
> >Release: NetBSD 5.0_RC1 2009-02-03 sources
> >Organization:
> Matthias Scheler http://zhadum.org.uk/
> >Environment:
> System: NetBSD colwyn.zhadum.org.uk 5.0_RC1 NetBSD 5.0_RC1 (COLWYN.64) #0: Fr
> i Feb 6 17:59:15 GMT 2009
> tron%colwyn.zhadum.org.uk@localhost:/src/sys/compile/COLWYN.6
> 4 amd64
> Architecture: x86_64
> Machine: amd64
> >Description:
> One of the SATA disks in my server had a few write errors and was ejected
> for a RAIDframe RAID 1 a few days ago. When I finally noticed this
> morning I initiated a parity rewrite with "raidctl -R /dev/wd2e raid1".
> The rebuild failed unfortunately:
>
> raid1: initiating in-place reconstruction on column 0
> wd2e: error writing fsbn 268435392 of 268435392-268435519 (wd2 bn 268435455;
> cn 266305 tn 0 sn 15), retrying
> [...]
> wd2e: error writing fsbn 268435392 of 268435392-268435519 (wd2 bn 268435455;
> cn 266305 tn 0 sn 15)
> wd2: (id not found)
> raid1: Recon write failed!
> raid1: reconstruction failed.
>
> I retried the parity rewrite but it was rejected by "raidctl" because of
> an invalid I/O control.
Do you have a bit more info on exactly what you tried here and what
the error was? A parity rewrite shouldn't have bumped
reconInProgress.
> The reconstruction was not tried again. When
> I later tried to shutdown the system (to check the cabling) the kernel
> stopped while unmounting the file systems with this message:
>
> unmounting file systems...raid1: Waiting for reconstruction to stop...
>
> I had to remove the power hard at this point.
>
> >How-To-Repeat:
> Use "raidctl -R /dev/<x> raid<y>" and try to shutdown the system afterwards.
I suspect the reconstruction also needs to fail, and you may need to
attempt to do something else again.. but I'm not sure yet...
(I can't see how reconInProgress is non-zero in rf_driver.c unless
there really is a reconstruction going on... From what you describe
here there wasn't an active reconstruction going on, and so I have no
clue how it could get into that state... :( )
Later...
Greg Oster
Home |
Main Index |
Thread Index |
Old Index