Subject: Re: some problems with "old" RAIDframe arrays on netbsd-1-6
To: NetBSD Kernel Technical Discussion List <tech-kern@NetBSD.ORG>
From: Greg Oster <oster@cs.usask.ca>
List: tech-kern
Date: 10/19/2003 15:13:27
"Greg A. Woods" writes:
> I'm having some problems with a pair of RAID-5 arrays that were created
> when the system was running 1.5W (-current as of about 2001/06/24).
> I've since upgraded the system to 1.6.1_STABLE (netbsd-1-6 as of about
> 2003/09/06).
>
> The first (and most critical) problem is that I'm unable to add a spare
> to one of the arrays (in order to replace a failed component):
>
> # raidctl -v -a /dev/sd6a raid0
> raidctl: ioctl (RAIDFRAME_ADD_HOT_SPARE) failed: Invalid argument
>
> After/as the command above runs the kernel prints the following on the
> console:
>
> Spare disk /dev/sd6a (512 blocks) is too small to serve as a spare (nee
> d 8890688 blocks)
>
> I.e. RAIDframe isn't seeing the disk's new label properly. In fact it
> is as follows:
[snip]
>
> 8 partitions:
> # size offset fstype [fsize bsize cpg/sgs]
> a: 8890697 63 RAID # (Cyl. 0*-
> 6140*)
> c: 8890697 63 unused 0 0 # (Cyl. 0*-
> 6140*)
> d: 8896512 0 unused 0 0 # (Cyl. 0 -
> 6143)
>
> Note I've made the `a' partition's size exactly match that of the other
> volumes in the array (they'er all slightly different kinds of disks).
RAIDframe is saying that the partition isn't quite big enough -- not
sure why, as if the size of the spare is *at least* the size of one
of the other active partitions, it should work.
> (I don't have another disk big enough to try adding a spare to the
> second RAID-5 array to see if it's something specific to this array....)
>
> After fiddling with the label (trying a size of 8890688 before noticing
> that the first value in the kernel message was a very unlikely and oddly
> "even" number), I tried again only to be surprised by a new error:
>
> # raidctl -v -a /dev/sd6a raid0
> raidctl: ioctl (RAIDFRAME_ADD_HOT_SPARE) failed: Device busy
>
> I.e. the disk vnode continues to have a v_usecount of 1 even after
> raidctl exits. Somewhere a VOP_UNLOCK() or vput() call must be missing.
Hmm.. This is fixed in -current, but "someone" forgot to request a
pullup. (said pullup request has been sent.)
> I also noticed that the "Autoconfig" value isn't copied to new disks
> that have been added as spares in the past. My original "raid0" wasn't
> autoconfiguring properly and I found that one component didn't have
> "Autconfig: Yes" any more. Rerunning "raidctl -A yes raid0" fixes it
> but I'd suggest the addition of a new component as a spare should
> inherit this value from the other components. Should I send-pr this?
> (I may try looking for a fix....)
I believe I've fixed a bug related to that quite some time ago. I
though it was fixed in 1.6 though.
> Also, as an aside, I never made any of my component partitions have an
> fstype of RAID before and yet autoconfig still worked in 1.5W. It was
> as if the check for (p_fstype != FS_RAID) wasn't happening before (even
> though I see the code right there in rf_netbsdkintf.c in my old source
> tree). This may be another hint about the disk labels not being read
> properly, though it still doesn't make any sense.
IIRC autoconfig used to work for even non-RAID partitions... but 1.5W
is so ancient that I don't recall exactly what it did :-}
> I'm going to reboot again now to make sure raid0 is really auto-
> configuring itself at boot and also to check that my new RAID-1 mirror
> for the root disk comes up properly....
>
> If anyone has any clues about the problem I'm having adding a spare to
> the RAID-5, please let me know!
Can you send the output of "raidctl -s raid0" and of "disklabel foo0"
where "foo0" contains one of the active components? RAIDframe is
usually only crabby about these sorts of things if there is an actual
size difference that will cause a problem.
Later...
Greg Oster