Subject: Re: anyone know if there's a fix for this "malloc with held simple_lock" in RAIDframe bug yet?
To: Greg Oster <oster@cs.usask.ca>
From: Greg A. Woods <woods@weird.com>
List: port-alpha
Date: 03/16/2005 01:50:41
[ On Tuesday, March 15, 2005 at 15:46:56 (-0600), Greg Oster wrote: ]
> Subject: Re: anyone know if there's a fix for this "malloc with held simple_lock" in RAIDframe bug yet?
>
> If you build a new 'raidctl' (actually... you might not need one, but
> whatever) then you can use the word 'absent' as a "disk does not
> exist" place-holder.
Yeah! (and the old raidctl worked fine -- it contains no relevant
changes, though in the end I backported it too, mostly for the manual
page fixups....)
OK, anyway, so far, so good and everything seems to be working properly
now.
I've constructed the new RAID-1 device, populated it with copies of the
installed filesystems, booted successfully from it, and I am now
reconstructing to the original install disk. The only weird thing is
the negative time estimate! ;-)
[console]<@> # raidctl -v -s raid0
Components:
/dev/sd1a: optimal
component1: failed
Spares:
/dev/sd0a: spare
Component label for /dev/sd1a:
Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
Version: 2, Serial Number: 1412893, Mod Counter: 106
Clean: No, Status: 0
sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
Queue size: 100, blocksize: 512, numBlocks: 71131904
RAID Level: 1
Autoconfig: Yes
Root partition: Yes
Last configured as: raid0
component1 status is: failed. Skipping label.
/dev/sd0a status is: spare. Skipping label.
Parity status: DIRTY
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.
[console]<@> # raidctl -v -F component1 raid0
Reconstruction sraid0: vnode was NULL
RECON: initiating reconstruction on col 1 -> spare at col 2
tatus:
0% | | ETA: -42:-536 |
And in any case "systat vm" is much more interesting to watch:
Disks: fd0 cd0 sd0 sd1 sd2
seeks
xfers 965 964
bytes 60M 60M
%busy 99.9 78.1
I really like those numbers! ;-)
(now why can't the filesystem move data that fast?)
hmmm... it is slowing down, must be near the end -- and it's done:
raid0: Reconstruction of disk at col 1 completed
raid0: Recon time was 618.804143 seconds, accumulated XOR time was 0 us (0.000000)
raid0: (start time 1110953957 sec 955365 usec, end time 1110954576 sec 759508 usec)
raid0: Total head-sep stall count was 0
raid0: 734085 recon event waits, 1 recon delays
raid0: 1110953957863819 max exec ticks
(perhaps some of those numbers are a little odd too....)
[console]<@> # raidctl -v -s raid0
Components:
/dev/sd1a: optimal
component1: spared
Spares:
/dev/sd0a: used_spare
Component label for /dev/sd1a:
Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
Version: 2, Serial Number: 1412893, Mod Counter: 108
Clean: No, Status: 0
sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
Queue size: 100, blocksize: 512, numBlocks: 71131904
RAID Level: 1
Autoconfig: Yes
Root partition: Yes
Last configured as: raid0
component1 status is: spared. Skipping label.
Component label for /dev/sd0a:
Row: 0, Column: 1, Num Rows: 1, Num Columns: 2
Version: 2, Serial Number: 1412893, Mod Counter: 108
Clean: No, Status: 0
sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
Queue size: 100, blocksize: 512, numBlocks: 71131904
RAID Level: 1
Autoconfig: Yes
Root partition: Yes
Last configured as: raid0
Parity status: clean
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.
[console]<@> # df
Filesystem 1M-blocks Used Avail %Cap Mounted on
/dev/raid0a 1968 619 1250 33% /
/dev/raid0d 9844 5461 3889 58% /usr/pkg
/dev/raid0e 18440 43 17475 0% /var
mfs:88 969 0 920 0% /tmp
/dev/sd7a 712380 0 705256 0% /home
/dev/sd6a 716634 842 708625 0% /var/log
/dev/sd5a 1040029 1382 1028246 0% /var/spool/imap
And after one final reboot from sd0, adding the real spare, etc., all is well:
[ttyp0]<woods@newpub> # raidctl -v -s raid0
Components:
/dev/sd1a: optimal
/dev/sd0a: optimal
Spares:
/dev/sd2a: spare
Component label for /dev/sd1a:
Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
Version: 2, Serial Number: 1412893, Mod Counter: 114
Clean: No, Status: 0
sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
Queue size: 100, blocksize: 512, numBlocks: 71131904
RAID Level: 1
Autoconfig: Yes
Root partition: Yes
Last configured as: raid0
Component label for /dev/sd0a:
Row: 0, Column: 1, Num Rows: 1, Num Columns: 2
Version: 2, Serial Number: 1412893, Mod Counter: 114
Clean: No, Status: 0
sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
Queue size: 100, blocksize: 512, numBlocks: 71131904
RAID Level: 1
Autoconfig: Yes
Root partition: Yes
Last configured as: raid0
/dev/sd2a status is: spare. Skipping label.
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.
> > Once I get to the poing of booting from the mirrored root then I'll send
> > you my diffs
>
> Ok. I suspect the diffs will be quite large -- a lot of stuff has
> changed. Might be good to keep them around in case other folks are
> interested in them, but I'm not sure I'd want to request a pullup of
> that size for 1.6.x :-} (The releng folks would probably shoot me :) )
Ah, no, I meant the diffs to -current that are necessary to do the
backport..... (that's the only way I'll be able to maintain this change
in my own trees)
They're quite small, less than 800 lines as a unidiff, even including
some added debugging messages and a wee readme that reminds me how to
update my source trees.
As for whether it's worth doing the backport officially or not, well it
does seem essential for anyone wanting to use RAIDframe on any 1.6.x SMP
platform. :-)
--
Greg A. Woods
H:+1 416 218-0098 W:+1 416 489-5852 x122 VE3TCP RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com> Secrets of the Weird <woods@weird.com>