Subject: Re: Supporting sector size != DEV_BSIZE
To: None <tech-kern@netbsd.org>
From: Trevin Beattie <trevin@xmission.com>
List: tech-kern
Date: 06/10/2002 09:46:07
At 10:34 PM 6/10/2002 +0700, Robert Elz wrote:
> Date: Fri, 07 Jun 2002 15:52:43 -0700
> From: Trevin Beattie <trevin@xmission.com>
> Message-ID: <3.0.32.20020607155218.00690eb0@clay.wh.ca.us>
>
> | I've been poring over the mkfs.c code to make sure there are no
references
> | to DEV_BSIZE (as that value is not stored on disk), and I came across the
> | following at line 899:
> |
> | node.di_blocks = btodb(fragroundup(&sblock, node.di_size));
> |
> | A quick grep through the ffs code confirmed that di_blocks is assumed
to be
> | in units of DEV_BSIZE. This may cause problems if, as Bill has
suggested,
> | a drive is transferred to another system where the kernel's value of
> | DEV_BSIZE is different. Am I mistaken, or does anyone have a
suggestion on
> | how to fix it without breaking existing implementations? Why doesn't
> | di_blocks simply count the number of fragments?
>
>Since no-one else answered this, and since di_blocks was my code
>originally, a long long time ago, perhaps I can explain what is
>going on there.
>
>The theory is that di_blocks is in very well known constant units,
>that are known by everyone. If the count was in units of fragments
>then applications (like du, ls) that extract the information would
>need to have a way to know the fragment size before they could use
>the information.
That makes perfect sense, and answers my last question.
>
>So, di_blocks isn't really supposed to be in DEV_BSIZE units, it is
>supposed to be in 512 byte block units. But it happens that at the
>time, DEV_BSIZE==512 was one of the unchangable constants of the
>universe (kind of like pi=3.14159... or NULL=0) and the distinction
>between things wanting the constant number, and things wanting device
>blocksize units was very much blurred (as you're discovering).
>
>An alternatiive implementation might have had di_bsize recorded in
>frags in the filesystem, and then converted to a well known constant
>unit in stat() (etc) before being made visible to average userland
>utilities. That was never considered at the time this was being
>implemented, there simply was no motivation to look into things that
>deeply.
>
>What's important I guess is that stat() returns a count of 512 byte
>blocks, however you want to make the filesystem (and filesystem
>cognisant utilities) behave here.
This got me to thinking about what the POSIX and SUSv2 standards have to
say about the stat() function, so I poked around some drafts I have. The
1990 POSIX standard, BTW, does not include st_blocks in struct stat; this
was added in the 200x version. There is also another new member,
st_blksize, which is defined as "the preferred I/O block size for this
object". But strangely, the definition of st_blocks as the "number of
blocks allocated for this object" does not define what the size of those
blocks are, esp. whether the blocks are a constant size or in terms of
st_blksize. The definition of the data type blkcnt_t is even more vague:
"Used for file block counts." :-P
The only reference to a specific size that I could find was in the
rationale section for the du(1) program:
"The use of 512-byte units is historical practice and maintains
compatibility with ls and other utilities in this volume of IEEE Std
1003.1-200x. This does not mandate that the file system itself be based on
512-byte blocks. The -k option was added as a compromise measure. It was
agreed by the standard developers that 512 bytes was the best default unit
because of its complete historical consistency on System V (versus the
mixed 512/1024-byte usage on BSD systems), and that a -k option to switch
to 1024-byte units was a good compromise. Users who prefer the 1024-byte
quantity can easily alias du to du -k without breaking the many historical
scripts relying on the 512-byte units.
"The -b option was added to an early proposal to provide a resolution to
the situation where System V and BSD systems give figures for file sizes in
blocks, which is an implementation-defined concept. (In common usage, the
block size is 512 bytes for System V and 1024 bytes for BSD systems.)
However, -b was later deleted, since the default was eventually decided as
512-byte units."
Neither of the standard drafts I looked at mentions the macro S_BLKSIZE,
but we have it in <sys/stat.h> defined as 512. Would there be any
objection to replacing btodb() with an expression using S_BLKSIZE
everywhere that di_blocks is used?
-----------------------
Trevin Beattie "Do not meddle in the affairs of wizards,
trevin@xmission.com for you are crunchy and good with ketchup."
{:-> --unknown