Subject: Disklabel ioctls (was Re: nore on disk ...)
To: Gordon Ross <gwr@mc.com>
From: Jason Thorpe <thorpej@nas.nasa.gov>
List: tech-kern
Date: 11/14/1995 10:15:48
[ Time to change the subject, since the old one only shows off my
poor typing :-) ]
On Tue, 14 Nov 95 11:15:19 EST
"Gordon W. Ross" <gwr@mc.com> wrote:
> Or do it like SunOS: <sun/dkio.h>
>
> /*
> * Disk io control commands
> */
> #define DKIOCGGEOM _IOR(d, 2, struct dk_geom) /* Get geometry */
> #define DKIOCSGEOM _IOW(d, 3, struct dk_geom) /* Set geometry */
> #define DKIOCGPART _IOR(d, 4, struct dk_map) /* Get partition info */
> #define DKIOCSPART _IOW(d, 5, struct dk_map) /* Set partition info */
> #define DKIOCINFO _IOR(d, 8, struct dk_info) /* Get info */
Yah, it would be nice to have these implemented, if nothing else than for
the sake of binary compatibility.
However, I suggested the semantics I did given that we have:
DIOCGDINFO get disklabel
DIOCSDINFO set in-core disklabel
DIOCWDINFO set in-core disklabel, update disk
DIOCGPART get partition info; only really useful in kernel
Since our disklabel contains geometry (albeit potentially incorrect if
misconfigured) and partition info, it seems like something to just ask
the disk for geometry was more in order. Sort of like a "generic"
version of a SCSI rigid-geometry mode-sense and indentify. Maybe this
could be generalized a bit more and provide vendor, model, and revision
strings, if applicable.
Actually, I've been thinking about disklabels lately; that 30-minute
commute gives me that opportunity. I've been getting pretty frustrated
at some of the stupidity in our current disklabel semantics. I think
what bugs me the most is the fact that on some devices, notably SCSI, the
total size of the disk is often a few to several blocks larger than:
sectors/track * tracks/cylinder * cylinders
Indeed, I have a couple of disk manuals at work that flat out say that
the disk's reported geometry is _estimated_ based on observed averages.
Disklabel(8)'s inability to deal with this is a problem. For example,
you have a new SCSI disk, you install it, and proceed to install the
disklabel. disklabel(8) chokes because partition `c' (or, whichever
RAW_PART happens to be on your system... *ahem*) runs past end of unit.
I.e. disklabel(8) did the math, and determined that your absolutely
correct disklabel is in error. (Remember: MI SCSI and other drivers set
the size of RAW_PART and some geometry info in the disklabel if no label
is read off disk). So, wanting to label the disk, you shrink `c'. BZZT!
Can't do that either; "open partition would move or shrink".
So, the way I bypassed this in the hp300 install.sh is, well, sickening:
* Increase the number of cylinders by one. Add a dummy partition
(a in this example) of offset 0, size 1 (block). Label disk, using
`disklabel -r -R /dev/rsd?c labelfile'.
* Run `disklabel -e /dev/rsd?a'. Adjust cylinders back to proper
value, and adjust the size of `c' so that the math works out.
* Run `disklabel -e /dev/rsd?c' again and actually edit the
partition map to your liking.
I would propose something along these lines:
* DIOCWDLABEL should _not_ return ESRCH in the event there's no
label on the disk. Instead, it should update the disk, just
like you asked it to.
* New ioctl: DIOCRDLABEL - attempt to do a raw read of the
label area _without_ actually updating the in-core copy. This
would provide a mechanism for accurately testing for the presence
of a disklabel in a machine-independent manner. Could return
ESRCH if there's not a suitable label. Should be passed the
following as an argument:
struct dioc_rdargs {
char drd_errbuf[128]; /* error message */
struct disklabel drd_label; /* fill me in */
};
The error message might be "magic number incorrect",
"checksum failure", or "disklabel not present". The current
mechanism for determining the cause of an error leaves a bit
to be desired.
* All knowledge of the on-disk format of the disklabel should
be pulled out of disklabel(8) and cursed at. The on-disk
format should be dealt with exclusively in the kernel, with
the exception of those ports that require access to the label
area to install the boot block. A program to handle that
should live in /usr/mdec.
* disklabel(8) should be more lenient when slapping one's
hands for incorrect RAW_PART size. In particular, it should
probably print a message to stdout if the extra blocks are less
than one cylinder, and maybe to stderr if it looks really bad.
* The kernel _should_ allow an open partition to move or shrink
iff the following are true:
* The only partition open is RAW_PART.
* Only one flavor (char or block) of that partition is
open.
* The partition is open for writing.
* The "label is writable" flag for that disk is set.
Given the nature of RAW_PART, it's pretty difficult to
"move" it, but I think you get the idea :-)
* Ports (like the i386, where it has to deal with the dos
partition foo) should implement the machine-dependent foo
in the kernel, with a generic interface:
DIOCMACHDEP - machine-dependent disk ioctls, read/write,
takes the following argument:
struct dioc_machdep {
int dmd_op; /* machdep operation */
void *dmd_data; /* pointer to data/buffer */
size_t dmd_datalen; /* size of data/buffer */
};
This is somewhat analagous to how sys_sysarch() works.
Any program taking advantage of this interface is inherently
machine-dependent, and should be condemned spend it's days in
/usr/mdec.
Any comments/discussion? :-)
--------------------------------------------------------------------------
Jason R. Thorpe thorpej@nas.nasa.gov
NASA Ames Research Center Home: 408.866.1912
NAS: M/S 258-6 Work: 415.604.0935
Moffett Field, CA 94035 Pager: 415.428.6939