Subject: Re: URGEND: raid 5 array failed due to power outage
To: Uwe Lienig <uwe.lienig@fif.mw.htw-dresden.de>
From: Greg Oster <oster@cs.usask.ca>
List: port-alpha
Date: 08/26/2004 09:49:47
Uwe Lienig writes:
> Hello Tobias,
> hello Greg,
> hi alpha gurus (for the hardware question)
> =
> Thanks a lot for your answers. Since Grep appended, that it would be wi=
se to =
> re-build the raid in degraded mode I'd like to ask, how I have to =
> reconstruct. My thoughts are as follows:
> =
> $ > raidctl -C /etc/raid0.conf raid0
> $ > raidctl -I 20040825 raid0
> $ > raidctl -f /dev/sd12b raid0
This should be ok.
> After that I would check the disklabel.
> Mount everything readonly, backup the file systems and then bring the r=
aid =
> back to life.
> =
> To bring the raid back to life I would do:
> $ > # comment
> $ > # first reconstruct the sd12b to the spare
> $ > raidctl -F /dev/sd12b raid0
> $ > # then, if necessary, replace sd12b and rebuild the raid
> $ > raidctl -B raid0
> $ > # the raid should be back in normal operation
> Please verify if this procedure would work as expected.
You'd be better off using:
raidctl -R /dev/sd12b raid0 =
to rebuild back on top of sd12b. Copyback works, but has some =
serious limitations (e.g. no IO to the RAID set while the copyback is =
happening!) and needs to be replaced.
However: your other email indicates:
> Aug 23 11:35:49 lwfv-fs /netbsd: Kernelized RAIDframe activated
> Aug 23 11:35:49 lwfv-fs /netbsd: root on sd0a dumps on sd0b
> Aug 23 11:35:49 lwfv-fs /netbsd: root file system type: ffs
> Aug 23 11:35:49 lwfv-fs /netbsd: RAIDFRAME: protectedSectors is 64
> Aug 23 11:35:49 lwfv-fs /netbsd: raid0: Component /dev/sd10b being conf=
igured
> at row: 0 col: 0
> Aug 23 11:35:49 lwfv-fs /netbsd: Row: 0 Column: 0 Num Rows: 1 =
Num Co
> lumns: 6
> Aug 23 11:35:49 lwfv-fs /netbsd: Version: 2 Serial Number: 200=
407290
> 1 Mod Counter: 225
> Aug 23 11:35:49 lwfv-fs /netbsd: Clean: No Status: 0
> Aug 23 11:35:49 lwfv-fs /netbsd: /dev/sd10b is not clean!
> Aug 23 11:35:49 lwfv-fs /netbsd: raid0: Component /dev/sd11b being conf=
igured
> at row: 0 col: 1
> Aug 23 11:35:49 lwfv-fs /netbsd: Row: 0 Column: 1 Num Rows: 1 =
Num Co
> lumns: 6
> Aug 23 11:35:49 lwfv-fs /netbsd: Version: 2 Serial Number: 200=
407290
> 1 Mod Counter: 225
> Aug 23 11:35:49 lwfv-fs /netbsd: Clean: No Status: 0
> Aug 23 11:35:49 lwfv-fs /netbsd: /dev/sd11b is not clean!
> Aug 23 11:35:49 lwfv-fs /netbsd: raid0: Component /dev/sd12b being conf=
igured
> at row: 0 col: 2
> Aug 23 11:35:49 lwfv-fs /netbsd: Row: 0 Column: 2 Num Rows: 1 =
Num Co
> lumns: 6
> Aug 23 11:35:49 lwfv-fs /netbsd: Version: 2 Serial Number: 200=
407290
> 1 Mod Counter: 225
> Aug 23 11:35:49 lwfv-fs /netbsd: Clean: No Status: 0
> Aug 23 11:35:49 lwfv-fs /netbsd: /dev/sd12b is not clean!
> Aug 23 11:35:49 lwfv-fs /netbsd: raid0: Component /dev/sd30b being conf=
igured
> at row: 0 col: 3
> Aug 23 11:35:49 lwfv-fs /netbsd: Row: 0 Column: 3 Num Rows: 1 =
Num Co
> lumns: 6
> Aug 23 11:35:49 lwfv-fs /netbsd: Version: 2 Serial Number: 200=
407290
> 1 Mod Counter: 228
> Aug 23 11:35:49 lwfv-fs /netbsd: Clean: No Status: 0
> Aug 23 11:35:49 lwfv-fs /netbsd: /dev/sd30b has a different modfication=
count
> : 225 228
> Aug 23 11:35:49 lwfv-fs /netbsd: /dev/sd30b is not clean!
> Aug 23 11:35:49 lwfv-fs /netbsd: raid0: Component /dev/sd31b being conf=
igured
> at row: 0 col: 4
> Aug 23 11:35:49 lwfv-fs /netbsd: Row: 0 Column: 4 Num Rows: 1 =
Num Co
> lumns: 6
> Aug 23 11:35:49 lwfv-fs /netbsd: Version: 2 Serial Number: 200=
407290
> 1 Mod Counter: 228
> Aug 23 11:35:49 lwfv-fs /netbsd: Clean: No Status: 0
> Aug 23 11:35:50 lwfv-fs /netbsd: /dev/sd31b has a different modfication=
count
> : 225 228
> Aug 23 11:35:50 lwfv-fs /netbsd: /dev/sd31b is not clean!
> Aug 23 11:35:50 lwfv-fs /netbsd: raid0: Component /dev/sd32b being conf=
igured
> at row: 0 col: 5
> Aug 23 11:35:50 lwfv-fs /netbsd: Row: 0 Column: 5 Num Rows: 1 =
Num Co
> lumns: 6
> Aug 23 11:35:50 lwfv-fs /netbsd: Version: 2 Serial Number: 200=
407290
> 1 Mod Counter: 228
> Aug 23 11:35:50 lwfv-fs /netbsd: Clean: No Status: 0
> Aug 23 11:35:50 lwfv-fs /netbsd: /dev/sd32b has a different modfication=
count
> : 225 228
> Aug 23 11:35:50 lwfv-fs /netbsd: /dev/sd32b is not clean!
> Aug 23 11:35:50 lwfv-fs /netbsd: RAIDFRAME: Configure (RAID Level 5): t=
otal n
> umber of sectors is 179207680 (87503 MB)
> Aug 23 11:35:50 lwfv-fs /netbsd: RAIDFRAME(RAID Level 5): Using 20 floa=
ting r
> econ bufs with head sep limit 10
This RAID set should have *never* configured, and I'm not sure why it =
did. [time passes] Ok, the "old config" code has a bug, which is =
all the more reason for everyone to be using the autoconfig code.
[I *really* need to nuke that old code...]
> Aug 23 09:27:35 lwfv-fs last message repeated 6 times
> Aug 23 10:13:04 lwfv-fs syslogd: Exiting on signal 15
Was this a crash, or a reboot, or a hang, or??? (I'm just trying to =
figure out why the mod counters would be out by 3. I can understand =
them being out by 1, but never by 3 for the scenario you present.)
I'm not sure which way to suggest going right now... I still need =
more info... =
Later...
Greg Oster