Subject: Work-in-progress "wedges" implementation
To: NetBSD tech-kern <tech-kern@netbsd.org>
From: Jason Thorpe <thorpej@shagadelic.org>
List: tech-kern
Date: 09/22/2004 13:26:34
--Apple-Mail-26--534005905
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII; format=flowed
Wedges are a new way of representing disk partitions in the NetBSD
kernel.
The basic idea is to decouple the internal representation of disk
partitions
from the on-disk representation. Currently, the NetBSD kernel uses
"struct
disklabel" (a.k.a. BSD disklabel) for both in-core and on-disk
representation,
and operates on this structure exclusively.
The main problem is that some platforms use (by necessity) on-disk
representations other than the BSD disklabel. This is generally to
maintain compatibility with another OS on the platform (e.g. Mac OS on
a Macintosh), or because the system firmware understands a particular
format (e.g. Sun PROMs understand Sun disklabels).
In order to handle this "other format", individual platforms may support
an alternative on-disk representation. In the kernel, this is
represented
by "struct cpu_disklabel". Unfortunately, there are drawbacks to this
approach:
- Cross-platform disk portability is basically non-existent.
- The BSD disklabel cannot represent all of the pertinent
information of some other on-disk representations, and
vice-versa. This includes number of partitions and
partition names.
Another problem is the fact that the BSD disklabel uses 32-bit fields
for block numbers. This means that the largest disk that the BSD
disklabel
can describe is 2TB, which is not terribly large by today's standards.
Finally, in a world with hot-plug busses where devices may appear and
disappear at any time, deterministic disk probe ordering does not exist.
The old-fashioned disk naming scheme is not very usable in this
scenario.
Wedges solves these problems in the following ways:
- Disk partitions are represented in the kernel as separate
block devices, and there can be an arbitrary number of these
associated with a disk. Each wedge internally uses 64-bit
block numbers to support partitions > 2TB.
- Wedges includes a modular partition discovery framework,
allowing
different partition formats to be supported seamlessly on all
platforms. A module for the EFI GUID Partition Table (GPT)
format, which includes arbitrary numbers of partitions, 64-bit
block numbers, and Unicode partition names, is included.
- Wedges may also be configured using ioctls from user space,
allowing partition handling to be pushed out of the kernel,
if desired.
- Wedges are "named". That is, each wedge has an associated
name encoded in UTF-8. This name can be used to create a
device node in /dev to decouple the wedge's identity from
its probe-order-dependent unit number. Duplicate names are
suppressed, and partition discovery modules can try alternate
names in the event of a collision. For example, the GPT
module
may try the Unicode name associated with the GPT partition,
and
of that already exists, it may try again using the string
representation of the partition's GUID.
- Wedges represent partition types as strings, allowing for
arbitrary partition types.
The wedges implementation is a work-in-progress at the moment, designed
to allow for the use of old-style disk naming while wedges are still
under development. Features of the current wedges implementation:
1. More items are moved from individual disk softc structures
into "struct disk". Among other things, this allows for
information sharing and better synchronization between
wedges and their parent disks.
2. I/O is enqueued on the wedge and a new buf allocated in order
to perform I/O on the parent. This is a transitional
measure;
I would like to eventually make it possible for disk drivers
to
operate directly on the buf provided to the wedge.
3. Once wedges are created on a disk, I/O to that disk may only
be performed through its wedges, or on the disk's RAW_PART.
Wedges may not be created on a disk if any partition other
than RAW_PART is open.
4. A minphys entry point is added to "struct dkdriver".
Eventually,
I would like to fully utilize "struct dkdriver" as the
interface
to a disk from a wedge, rather than using a vnode. Once we
are
fully transitioned to wedges, I would like to see the
traditional
entry points to disk drivers go away, with the exception of
an
entry point for the raw disk, so that partitions may be
created
on it.
5. My patch includes modifications to make wedges work with the
"wd"
driver. I will convert the other disk drivers over time. An
outstanding question: What should we do about floppy drives?
6. I have modified fsck and mount to use the partition type
names
that wedges provide. Conveniently, I have defined names that
match the fsck_* and mount_* names for the various partition
types that indicate file systems.
Known issues:
1. You can't currently newfs a wedge. This is because newfs
requires the old-style DIOCGDLABEL ioctl, which wedges do
not support. I am working on a means for exporting the
parent disk's geometry through the wedge, which is what
newfs wants.
2. Related to (1), what to do about the block size / frag size
entries in "struct partition" (part of "struct disklabel",
and this antiquated and obsolete and not part of wedges)?
I would like to get "wedges" checked into the tree to allow for greater
collaboration on it. Since it does not interfere with the use of disks
through the traditional interface, I don't think it's necessary to put
this on a branch.
Diffs for review are at:
ftp://ftp.shagadelic.org/pub/wedge-diffs.txt
Thanks.
-- Jason R. Thorpe <thorpej@shagadelic.org>
--Apple-Mail-26--534005905
content-type: application/pgp-signature; x-mac-type=70674453;
name=PGP.sig
content-description: This is a digitally signed message part
content-disposition: inline; filename=PGP.sig
content-transfer-encoding: 7bit
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (Darwin)
iD8DBQFBUd/6OpVKkaBm8XkRAuYhAKDL7Em3uvZibSTvWolRzIU5VKvEpQCdGhn8
WsbTLEZHQBWszbH42r4VCAs=
=NYVz
-----END PGP SIGNATURE-----
--Apple-Mail-26--534005905--