Subject: Re: Filesystems vs. device sector sizes
To: Pavel Cahyna <pavel@netbsd.org>
From: Bill Stouder-Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 07/26/2007 12:05:53
--TB36FDmn/VVEgNH/
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Thu, Jul 26, 2007 at 11:36:36AM +0200, Pavel Cahyna wrote:
> On Wed, Jul 25, 2007 at 10:46:25PM -0700, Bill Stouder-Studenmund wrote:
> >=20
> > If EFS really really needs files smaller than device blocks, you need t=
o=20
> > use something like vnd. I don't envision us ever hacking the buffer cac=
he=20
> > to handle sub-device-block entities.
>=20
> Could the driver (cd) be taught to support 512b requests from upper layers
> by splitting sectors itself? That is, effectively pretend that the device
> block is 512b?
>=20
> Actually this is apparently already implemented. cd.c contains:
> -----
> /*
> * If the disklabel sector size does not match the device
> * sector size we may need to do some extra work.
> */
> if (lp->d_secsize !=3D cd->params.blksize) {
>=20
> /*
> * If the xfer is not a multiple of the device block size
> * or it is not block aligned, we need to bounce it.
> -----
>=20
> But apparently you need to set sector size in the disklabel to 512b
> otherwise cdstrategy will reject such requests.
If the disk isn't labeled to need 512-byte sectors, cdstrategy certainly=20
should reject such i/o. Given that these discs were created for SGI=20
systems that had 512b i/o, they probably have 512b-sector disklabels.
Otherwise, I think we should use vnd. There are other disc and disk=20
technologies that use large sectors, so we'll need to solve this problem=20
more than once. Or fix it in a way that's reusable.
> Maybe it is easier than teaching other parts of the kernel that
> DEV_BSIZE is not a constant.
That's not the problem, though. As of now, DEV_BSIZE just happens to be a=
=20
constant that we use to label struct buf offsets and block counts.
The problem is that the file system in question here, EFS, wants to use=20
i/o transfer sizes that are smaller than the smallest the device will do.=
=20
My recollection of the discussions back when Koji was working on this was=
=20
that this problem was considered a subcase of the DEV_BSIZE issues. It was=
=20
also considered more of a specialized case, and as such would/could/should=
=20
have a separate solution. Like vnd.
> What is the "device block" for devices that can perform reads in smaller
> chunks than writes, anyway? The write unit size or read unit size? (iirc
> DVD+RW and RAID 5 arrays are examples of this.)
I'm not sure about DVD+RW, but RAID 5 can write in the same size units it=
=20
can read. RAID arrays show up as disk drives, and they support=20
sector-sized i/o. So you can write just one sector on a RAID 5.
You're right that you don't WANT to do this often (and that writing a=20
whole stripe at once is MUCH better), but you can do it. :-)
Take care,
Bill
--TB36FDmn/VVEgNH/
Content-Type: application/pgp-signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (NetBSD)
iD8DBQFGqPCQWz+3JHUci9cRAtLuAJ4vcrfZZvVRDTQDpeAUbw9ELXEhIgCfUKRM
vkE2+CjFd+xnTCo7yFnyBYk=
=+JTQ
-----END PGP SIGNATURE-----
--TB36FDmn/VVEgNH/--