Subject: Re: Y2038, was as long as we're hitting FFS...
To: Ted Lemon <mellon@isc.org>
From: Bill Studenmund <wrstuden@nas.nasa.gov>
List: tech-kern
Date: 03/25/1999 16:38:50
On Thu, 25 Mar 1999, Ted Lemon wrote:
> Bill, please stop for a moment and tell yourself "this argument isn't
> about what *I* want. It's about what NetBSD users in general want."
> This statement is true about anything one might consider putting into
> NetBSD. I know it's been used on me more than once! :')
But on the flip side we also seem to have moved towards a policy where if
you're going to raise the bar on what has to be done, you have to be
willing to deliver on what you've asked for. :-)
> > And that it was not designed for what a lot of people are wanting it to do
> > - be a kitchen sink respository.
>
> Right. It was designed to solve exactly one problem: yours. That's
> the problem with the design, in a nutshell. Your problem is a
> reasonable and valid problem - don't get me wrong. But you've added
> a general machanism that isn't actually general, and now you're
> getting pushback.
Though our problem guided the design, it is a general solution. It's just
not a solution to the general problem you're proposing. :-) Nor does it
claim to be. It claims to be a solution to the problem of overly fs's
needing a small amount of per-inode storage.
> I don't see how this would be a problem. Because your application is
> as specialized as it is, it should be easy to arrange for your data to
> come first, and just do a sanity check to make sure it did. The
> performance impact of supporting more than one data hunk in the opaque
> data buffer should be negligible for your application if you do it
> this way. You get what you want, and you don't preclude other
> applications of the API.
>
> > I think it's fine to extend the interface a bit. Right now we have test,
> > get, set, and clear operations on the metadata. It seems easy to me to
> > extend them to take a magic number value, with (0) being the catch-all. So
> > then you can deal with different types of data, and even add an overlay fs
> > to store multiple types at once.
>
> Why not just do it right, so that we don't have to have people mount
> an overlay filesystem to get the correct (that is, general) behaviour
> later?
Because to satisfy the fully general case means sticking a database in the
filesystem. That strikes me as HARD. Actually, sticking a database into
the inode!
That's why.
Also, it strikes me as wrong. The database management should sit above the
fs, not in it. Even Apple did it that way - there's a resource fork which
the resource manager turns into all the fun little resources MacOS
programmers have grown to love. ;-)
And there's the fact that I don't see a problem which needs this
kitchen-sink solution. :-)
> AFAIK, the F_GETLK/F_SETLK call is in POSIX, and that's functionally
> the same as what you're talking about.
Cool!
> Once features have gone in, they can't come out. Your proposed
> change is will not be compatible with the general solution, so if we
> are ever going to implement the general solution, we have to do it
> now. Having a flags field won't help.
Yes, it will. Because there's nothing saying that the presence of other
flags can't preclude the use of the opaque data we're proposing. :-)
> > About (1), most of the proposals have solved a different problem than the
> > one we have in mind.
>
> Right, and that's what's wrong with your proposal. Making a major
> change to the kernel that purports to be generic but in fact is not is
> a mistake. If you don't want it to be generic, you should simply
> reserve the space you need for your application and duke it out that
> way. If you can't justify doing it that way, you also can't justify
> doing what you're currently doing.
Since our solution is an overlay fs, we can't reach into the ffs inode.
ffs (or lfs) has to have a way to give us this info. To do otherwise is an
even larger hack.
> The problem is not the API. The problem is the underlying data
> structure. If we do the underlying data structure wrong, a future
> change to the API will not cure the problem.
Not true. An API which supports variable length storage can just pick up
these ops as implicitly refering to 96-byte blobs.
Take care,
Bill