Subject: Re: Supporting sector size != DEV_BSIZE
To: Bill Studenmund <wrstuden@netbsd.org>
From: Darrin B. Jewell <jewell@mit.edu>
List: tech-kern
Date: 06/24/2002 23:43:59
It sounds like you have uncovered the same issues I noticed. My
philosophy about the appropriate route to follow centers around two
points:
1. The compiled in value for DEV_BSIZE should always be 512
2. existing media precedent should be followed to decide where to
change current uses of the DEV_BSIZE constant.
I probably should have made this clearer in my original mail. My
decision for 1. is that this value is not retrieved from persistent
media and so it should not be changed from its current value. My
decision for 2. is to avoid introducing arbitrary incompatibilities or
accidentally setting new precedent.
Bill Studenmund <wrstuden@netbsd.org> writes:
| > If I recall from my investigation there were at least
| > the following potentially independent sources of block size:
| > . units based on a 512 byte DEV_BSIZE
| > . units based on the ffs superblock (see FFS_DEV_BSIZE below)
|
| Note: those are file system blocks aka frags.
I would like to carefully assert that my definition of FFS_DEV_BSIZE
is explicitly not the file system fragment size. Under my definition,
the file system fragment size in bytes is determined by fs->fs_fsize
Even our current newfs sources set the default value for the fragment
size to 1024.
I also _always_ define the kernel constant DEV_BSIZE to be 512 and
_never_ use a different value for it. By treating it as a fundamental
constant that never changes and is never retrieved from persistent media,
it becomes an independent unit.
| > . units based on the disklabel d_secsize
| > ( this should always match the hardware device)
|
| Note: the latter isn't necessarily true. If you take a disk image & move
| it to another system, it may change. Folks wish to continue using the
| disklabel number.
This is why I mentioned it. I am not as adamant about this,
but I was thinking that the in core value for this field
should always match the hardware sector size. Currently,
the device strategy routines use d_secsize to interpret
bp->b_blkno. If d_secsize does not match the hardware sector
size, then the device strategy routines will need to be
modified to do the appropriate conversion.
| > At the time, I found the following definitions useful:
| >
| > #define FFS_DEV_BSHIFT(fs) ((fs)->fs_fshift-(fs)->fs_fsbtodb)
|
| That should be a constant in the ufs mount structure (the in-kernel
| thing). We don't need to subtract those constants every time; they aren't
| going to change.
That would be acceptable, although the optimization you suggest is
completely in the noise and adds considerable unnecessary complexity.
There are already lots of cases in fs.h that do this kind of
extra math repeatedly at run time.
| > #define ffs_btodb(fs, b) ((b) >> FFS_DEV_BSHIFT(fs))
| > #define ffs_dbtob(fs, db) ((db) << FFS_DEV_BSHIFT(fs))
| > #define FFS_DEV_BSIZE(fs) ffs_dbtob(fs,1)
| >
| > I remember facing a couple of decisions about what units
| > quotas and free block counts were kept in. Can you brief me
| > on decisions you made regarding these counters when authoring
| > your patches? Do you have a rational for your choice?
|
| He chose the design philosophy I (ehm strongly) suggested. :-)
Careful, we need to keep compatibility with other vendors who
have already made this choice, for better or worse. I have to
go back and reaffirm what choice Apple/NeXT did here. Did sun,
dec or anyone else also set precedent for this case? At the
end of this email, I include the partial dumpfs output for the
NeXTstep 3.3 OS distribution CD. You can see some of the
choices they made by examining the superblock values.
I can provide the rest of the dumpfs output if someone wants
to look at it. This was generated by the unmodified dumpfs in
currently in our source tree.
| NetBSD 1.5 only supported file systems on disks where the physical sector
| size == DEV_BSIZE. So anything that uses DEV_BSIZE and is stored on-disk
| should really use the fs-declared sector size (FFS_DEV_BSHIFT() above,
| though Trevin used a different name). When formatting a file system,
| "FFS_DEV_BSHIFT()" will equal the sector size of the media. That fs can
| later be moved around, because the superblock has enough info to recreate
| "FFS_DEV_BSHIFT()".
I agree that normally when creating a filesystem FFS_DEV_BSHIFT
will match the will equal the sector size of the media, but that
it may mismatch the media if it has been moved around.
I think most current uses of DEV_BSIZE need to be examined
to determine whether they should use FFS_DEV_BSIZE, d_secsize,
or a DEV_BSIZE constant of 512.
| > Do you agree with my list of independent sources of block
| > size? Are there any other fundamental ones not derived
| > from the above three? Should we create a list of derived
| > indications of block size and which fundamental block
| > size they should be derived from?
|
| There actually is one more. The buffer cache is kept in units of
| DEV_BSIZE. You can have a file system that was made with a DEV_BSIZE=1024
| get moved to a kernel with DEV_BSIZE=512. After these changes, we want
| that fs to work. So that means that when translating ffs_btodb() outputs
| to buffer cache offsets, we need to use a conversion to bridge between
| them. :-)
I think this is the case where I discuss modifying the hardware device
strategy routines above.
Thanks,
Darrin
As I mentioned, here is the partial
dumpfs output from a nextstep 3.3 operating system distribution CD:
# dumpfs ns33cd.ufs | head -22
file system: ns33cd.ufs
endian big-endian
magic 11954 time Sat Nov 12 00:44:21 1994
id [ 0 0 ]
cylgrp static inodes 4.2/4.3BSD fslevel 0 softdep disabled
nbfree 1406 ndir 3168 nifree 71290 nffree 51
ncg 45 ncyl 89 size 182272 blocks 176323
bsize 8192 shift 13 mask 0xffffe000
fsize 2048 shift 11 mask 0xfffff800
frag 4 shift 2 fsbtodb 0
cpg 2 bpg 1024 fpg 4096 ipg 1984
minfree 10% optim time maxcontig 20000 maxbpg 512
rotdelay 0ms rps 5
ntrak 32 nsect 64 npsect 0 spc 2048
symlinklen -1 trackskew 0 interleave 0 contigsumsize -1
maxfilesize 0xffffffffffffffff
nindir 2048 inopb 64 nspf 1
avgfilesize -1 avgfpdir -1
sblkno 8 cblkno 12 iblkno 16 dblkno 140
sbsize 2048 cgsize 2048 offset 64 mask 0xffffffe0
csaddr 140 cssize 2048 shift 9 mask 0xfffffe00
cgrotor 42 fmod 0 ronly 0 clean 0x01