Subject: Re: MTD devices in NetBSD
To: Garrett D'Amore <garrett_damore@tadpole.com>
From: Bill Studenmund <wrstuden@netbsd.org>
List: tech-kern
Date: 03/23/2006 12:09:40
--UlVJffcvxoiEqYs2
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Thu, Mar 23, 2006 at 10:26:31AM -0800, Garrett D'Amore wrote:
> Bill Studenmund wrote:
> > =20
> > We can do this even within a block device.
> >
> > Well-chosen calls to your strategy routine will work smothly, and you h=
ave=20
> > an ioctl interface for things like erase and whatever other calls you=
=20
> > need.
> >
> > I guess a way to put it is to think of using one interface in two=20
> > different ways as opposed to an interface "below" another one.
>=20
> I've been thinking about this as well. I think this idea implies that
> the "block" size of these things would match that native sector size.=20
Yes & no. We can look at how cd9660 handles this, as it has the same=20
issue (2k sectors !=3D 512 byte sectors).
> Mapping blocks to sectors 1:1 also means that for a lot of filesystems,
> you are going to have a lot of waste (e.g. does the filesystem allow for
> files to use less than a full device block) -- and this could be very,
> very undesirable on some systems. (E.g. 128K minimum file size on 4MB
> flash limits you to only 32 files. 16MB only gives 128 files.) 128K
> sector sizes are rare, but 64K sector sizes are *very* common. So you
> get 256 files in a 16MB "common" case.
>=20
> Hence, I think 1:1 block/sector mapping is a poor (even unworkable) choic=
e.
Can you read less than a block in these things?
> So, if the abstraction is going to use a smaller block size -- say 512
> bytes -- to get good allocation, we have other problems:
>=20
> For the rest of the discussion, lets assume a 64K sector size (the most
> common NOR flash size, I think):
>=20
> A naive implementation would make updating a sector an erase/modify
> cycle. Obviously this is bad, because writing (or updating) a 64K file
> now requires 128 erase cycles. Erase takes a long time, and wears down
> flash. This is unworkable.
Wait, I'm now confused. I thought we had one of three cases:
1) we have a flash-unaware file system sitting on a flash. This would be=20
intended as a r/o kinda thing to help with bring-up.
2) We have a flash-unaware file system on top of a wear-leveling layer on=
=20
the flash. This should work r/w.
3) We have a flash-aware file system sitting on a flash.
The case above isn't one of those three, so why do we care?
> So a non-naive implementation means you have to look at the bits you are
> updating to decide whether or not an erase is necessary. This means
> knowing the "set/clear" behavior of the bits, which isn't a problem.=20
> (The devices I've seen are all "set" on erase, and you can only clear
> individual bits.)
>=20
> But now, when I'm writing a 64K file I'm going to have to do 128 reads,
> writes. And, if the sector is unfortunately got a single bit clear near
> the end, I've not detected this case, and I wind up having to do a
> read-modify-write even after I've done all the work to try to avoid it.
i'm still confused. :-) 1) I don't think a file system will really use=20
512-byte blocks internally. You'd have to specificaly set it, and I'm not=
=20
sure it'd be worth it.
2) If you're writing a 64k file, you aren't going to have 512-byte writes=
=20
coming in unless you've mis-configured dd. ;-) stdio will do 8k i/o, and=20
you'll get better performance with large block sizes in dd...
> If I operate on sectors natively, and expose that to the filesystem,
> then the filesystem can do an upfront check, erase the sector as needed,
> and *then* do the write, all at once. (Assuming again we are writing a
> 64k file.) Since the filesystem knows its a 64k write, it can do "the
> right thing".
>=20
> I think this means that the filesystem should *really* have a lot more
> direct control over the device, and be able to operate on sectors rather
> than blocks. (And we've already ruled out a 1:1 sector/block mapping,
> at least if you are going to want to be able to put any other kind of
> ordinary filesystem down on these for a readonly filesystem.)
>=20
> Therefore, I'm coming to the conclusion that we need to expose *sectors*
> to a flash-aware filesystem, and the block abstraction is poor for these
> filesystems.
>=20
> Am I missing something here?
I think you're painting yourself into corners we don't need to be trapped=
=20
in.
If the flash-unaware fs is only used in r/o mode, why do we need to worry=
=20
about its write performance?
The, "It's HARD to solve the problem," reason is quite reasonable at=20
times, and this may well be one.
Take care,
Bill
--UlVJffcvxoiEqYs2
Content-Type: application/pgp-signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (NetBSD)
iD8DBQFEIwCEWz+3JHUci9cRAqA3AKCYrA8bu1AQkexoPs1XIQLj6MdhOwCfd31s
IFM9j0K2XbyTZnSP/ravf4I=
=kisf
-----END PGP SIGNATURE-----
--UlVJffcvxoiEqYs2--