Subject: Re: RAIDFrame and NetBSD/sparc booting issues
To: Brian Buhrow <buhrow@lothlorien.nfbcal.org>
From: Greg Oster <oster@cs.usask.ca>
List: port-sparc
Date: 08/13/2003 18:28:37
Brian Buhrow writes:
> Hello. I've followed this entire discussion with much interest and
> thought more about the way raiframe works now and the question of how to
> make things work more transparently on raid-1 systems with existing boot
> roms. I realized I have a couple of questions and would like to make sure
> my understanding of the current layout of raidframe partitions is correct.
>
> 1. It's my understanding that the area protected by RF_PROTECTEDSECTORS is
> designed to include such things as the physical disklabel, any boot
> strapping code that might reside on the physical disk, and the raid
> component label itself.
Not quite. RF_PROTECTEDSECTORS is there to tell the data-writing
bits of RAIDFrame to "don't touch this space at the beginning of a
component". It turns out that the space that we skip over contains
all of the things you mention above :)
> Thiss would imply that disklabel -r sd0 or wd0
> should read the label out of this protected region, assuming the raid
> partition includes the entire disk. Is this right?
No. If the disklabel usually lives at block 0 for the arch, then the
disklabel for raid0 will be found at block 0 of the data area of the RAID
set. (i.e. typically at block RF_PROTECTEDSECTORS of the first component
of any RAID set). Doing a 'disklabel sd0' or 'disklabel wd0' will
get the label from the underlying drive, but that will never contain
the disklabel for raid0 (at least as things currently stand).
> 2. It looks to me like most of the boot loaders work in such a way that
> the first stage loader has the block numbers of the second stage loader
> hard coded into them, meaning that the second stage loader could be loaded
> from any portion of the disk, including the first portion of an ffs
> filesystem inside a raid-1 partition. Once the second stage loader is
> loaded, I believe space and code constraints are sufficiently removed, that
> the second stage loader could properly locate a kernel inside a raid-1 set
> or a physical disk directly -- no? If this is so, then it strikes me as
> easier to teach the second stage boot loader how to locate a kernel either
> in an FFS filesystem in a raid-1 set or in an FFS filesystem in a physical
> partition. Of course, once the kernel is loaded, it can already find the
> FFS filesystems inside any raid sets, so that problem is solved.
Some of this is already done and working, with much thanks to others.
> (I should note, that it would also be necessary for the installboot program
> to know about FFS filesystems inside raid-1 partitions as well, just so it
> can plug in the right numbers for the second stage loader, even if that
> loader is inside an FFS filesystem in a raid-1 set. (Presumably, it could
> also locate the loader in a raid-5 set, but of course that wouldn't
> actually boot unless the kernel happened to fit inside one of the stripes
> of the raid, but that's an entirely different problem :).))
I'd completely ignore RAID 0 and RAID 5 for this... while getting a
kernel from a RAID 0 is just a matter of a little math, getting a kernel
loaded off of a failed RAID 5 set is a *lot* of math.
> If that problem is solved, I fail to see the need for moving a
> component label around and thus having to special case the raid-1 instance
> inside the raidframe code. This, would, I believe, free up Greg O's time
> to look into issues like:
>
> 1. Determining the feasibility of changing the autoconfiguration code to
> account for hot standby partitions, and being able to auto-reconstruct into
> them in the event of a component failure without user intervention.
>
> 2. Examine why paging to a raid-5 set causes hangs.
I've actually gained a fair bit of insight into this over the last
month (with much thanks to the help and 400MB kernel core files provided
by an unnamed source). I won't go into the details here, but basically
if there is any paging (and not just to swap space!) to any device
that doesn't have a malloc-free code-path, then the potential is
there for a deadlock. RAIDframe doesn't need to be involved here,
though I havn't had time to come up with a repeatable test case yet.
(Softdeps seems to speed up the hang, since it seems to have a lot
more pages that are not PG_CLEAN).
> My point here is that I believe this discussion started because there
> is some concern that teaching a system to boot from a raid-1 set is not as
> straightforward as it should be. Unless I'm gravely mistaken, and please
> tell me if I am, this deficiency can be met by modifying the
> orders-of-magnetude less complicated boot loder programs for the various
> architectures than by modifying the raidframe system itself.
Considering the amount of time I have for RAIDframe hacking at the
present, small changes to boot loaders looks very good to me :-}
Later...
Greg Oster