tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
the bouyer-quota2 branch
Hi,
I think the code in the bouyer-quota2 branch is stable now, and
ready to be merged to HEAD. Unless objections, I'll merge it in
about 2 weeks.
To get a diff:
cvs -d anoncvs%anoncvs.netbsd.org@localhost:/cvsroot -kk -u -r bouyer-quota2 -r
bouyer-quota2-base src
This branch is for the developement of a modernized disk quota system.
The 2 main changes are: a new quotactl(2) interface and a new on-disk
format, compatible with journaled ffs.
The new quotactl(2) uses a plist format to send commands and exange data
with the kernel. Using plists for this has several bonus:
- the plist format can change without the need to version the syscall,
only the plist parser needs to be changed and backward compat can be
at the parser level.
- the plist format can easily be extended to fit other filesystems than
ufs.
- it is easy to pass it back to puffs servers
- it is easy to use in scripts.
the format used is documented in quotactl(2). A new quotactl(8) command
has been added, which allows to send/receive plist from userland;
the idea is to make it easier to manage quotas from scripts.
The branch has code under COMPAT_50 to deal with the old syscall.
The in-tree quota commands quota(1), edquota(8), repquota(8), rpc.rquotad(8),
quotacheck(8), quotaon(8) have been updated to use the new syscall interface.
I also took this opportunity to change the semantic of values reported by
these utilities (wich are also the values used in plists): 0
is "nothing allowed" (instead of 1 actually), "no limit" is represented by
the string "-" or "unlimited" (in the plist as well as the new
on-disk format this is UQUAD_MAX, i.e. 0xffffffffffffffff). The old disk format
still uses 0 as umlimited and 1 as nothing allowed; the semantic difference
is handled in kernel and userland convertion utilities (see quota1_subr.c)
repquota gains a -x option, which exports the quotas as a "set" plist command
which can be feed directly to quotactl(8). This is one way to move limits
from one fs to another (or convert to the new on-disk format).
A new on-disk format has been added (called quota2, see quota2.h).
The usages and limits are stored in unlinked inodes (one for users and one
for group quotas), it can not be stored outside of the filesystem any more.
This ensures that quotas are covered by the filesystem clean flag or journal.
A quota file has a header, containing some persistent parameters, a default
quota entry, and quota entries free and hash list. The quota file is not
sparse, quota entries are held in hash list. The kernel keeps a cache of
quota entries, which is keeps offset in the file to avoid to walk the list
on each loopup.
This new format has grown 64bis limits and usage (32bit is not enough for
modern storage sizes), and 2 new features:
- a default quota entry is used as template for new quota entries allocated
when a new uid/gid shows up on the filesystem. This template is configurable,
so that a sysadmin what to allow to unknown users.
- per-user/group grace time.
quota are enabled with tunefs -q user and/or -q group (and disabled with
-q nouser -q nogroup), of at newfs time with the same -q option.
after a tunefs -q a fsck of the filesystem is required.
There is no quotacheck/quotaon anymore for quota version 2. quota usages
are checked in fsck_ffs(8) at the same time as other filesystem metadata.
Usages are computed phase1 (and adjsusted in othe phases if fsck needs to
create or delete files, or change block allocations) and checked against
recorded usages in phase6. phase6 will also do other consistency checks
against the quota inodes, or even create it if noone exists (e.g. just
after a tunefs). While doing this I discovered some pieces missing in
fsck_ffs about block accountings when allocating inodes and blocks,
which I fixed (This is why ffs_clusteracct() moved to ffs_subr.c,
as a bonus it's one less function replicated in makefs(8)).
Instead of keeping usages in memory, synced to disk on sync or
at umount time, quota usages are now updated as other metadata in
real time (or delayed write, depending on mount options). This way,
quota usages are also covered by the journal (usage update is in the same
WAPBL transaction as the one allocating/freeing inodes or blocks),
and so usages should be accurate after a log replay (quotacheck(8) is
basically a pass 1 fsck, and the time required for today's storage sizes
is just not acceptable).
This code has been tested in several way. In addition to the atf
tests in the branch testing basic functionalities (as well as some
corruption senarii for fsck_ffs), I did stress-tests on a XEN3_DOMU
with 256Mo RAM as well as on a dual-core i5 (with hyperthreading, so the
kernel sees 4 CPUs) with 2Gb ram. One of the stress test has been
to run 5 bonnie++ in a loop under 5 different uids, while at the same time
running quota(1), repquota(8), quotactl(8) commands in loops, on both
logged and non-log filesystems.
I also ran a bonnie++ in a loop while taking and deleteing snapshots
of the filesystems, also in loops. All issues discovered this way have been
fixed.
In order to have fsck_ffs against a snapshot report no error, I had to do
wider change. I added a per-inode flag, "SF_SNAPINVAL", used to mark a
snapshot inode as invalid. Right now, a snapshot inode shows up as a
0-size regular file in the snapshot, and userland tools don't know it is
a snapshot inode. The result is that quota usages are miscomputed by
fsck_ffs as snapshot inodes are not included in usage. Now snapshot inodes
in the snapshot are marked SF_SNAPSHOT | SF_SNAPINVAL, so userland tools
know it's a snapshot (as a bonus, dump can ignore them as well), while
the kernel can deny using it as a snapshot.
I believe this flag can also be used to speed up snapshot creations, but
this won't be investigated as part of the branch.
Finaly here are some bonnie++ results on the code i5 above (i'll add that
the disk system is a 500Gb WDC WD5000AADS-00S9B0 on a ahcisata controller)
used for tests. "plain" is HEAD with plain ffs, "log" the same mounted
with -o log.
"quota1" is "plain" with user quota1 enabled (the quota file is at the
root of the test filesystem), "quota2" is "plain" with the new quota
enabled for user. "quota2log" is "quota2" mounted -o log (qouta1 and log
are mutually exclusive).
As you can see there is no measurable performance impact.
Version 1.03e ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
plain 4G 71199 43 71717 11 30440 5 73216 77 74573 9 183.4 0
plain 4G 71972 44 71906 12 30446 6 73959 77 74637 9 178.4 0
plain 4G 71922 44 71800 11 30438 6 73756 77 74669 9 177.8 0
log 4G 69776 43 71641 13 30732 6 73709 77 74653 9 176.1 0
log 4G 71254 44 71404 12 30548 6 73968 77 74653 9 176.5 0
log 4G 71183 44 71581 13 30499 6 73400 77 74812 9 176.5 0
quota1 4G 70320 43 71792 12 30694 6 73787 77 74637 9 180.3 0
quota1 4G 71567 43 71772 12 30781 6 73774 77 74541 9 178.8 0
quota1 4G 71829 44 71669 12 30393 5 73324 77 74796 9 179.1 0
quota2 4G 70349 43 71311 12 30502 5 71670 75 74636 9 181.2 0
quota2 4G 72125 44 71486 12 30560 6 73385 77 74621 9 178.0 0
quota2 4G 71411 43 71379 12 30606 6 73772 77 74621 9 179.9 0
quota2log 4G 69453 43 71947 13 30700 6 73554 77 74748 9 177.7 0
quota2log 4G 70718 44 71635 13 30433 6 74192 78 74716 9 174.3 0
quota2log 4G 72394 45 71641 13 30681 6 73601 77 74684 9 177.7 0
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
plain 16 1693 25 +++++ +++ 5220 14 1835 27 12712 99 3718 28
plain 16 1707 25 +++++ +++ 5225 14 1783 26 12647 99 3665 28
plain 16 1800 26 +++++ +++ 5127 15 1830 27 12697 99 3402 26
log 16 8687 88 +++++ +++ +++++ +++ 10006 99 12608 99 23303 99
log 16 9051 91 +++++ +++ +++++ +++ 10014 99 12652 99 23148 99
log 16 9868 99 +++++ +++ +++++ +++ 10027 99 12675 99 23300 100
quota1 16 1639 24 +++++ +++ 5220 14 1704 25 12713 100 3614 27
quota1 16 1718 25 +++++ +++ 5222 14 1628 24 12744 100 3659 28
quota1 16 1742 25 +++++ +++ 4535 13 1854 27 12643 99 3720 28
quota2 16 1729 25 +++++ +++ 5188 15 1940 28 12626 99 3743 29
quota2 16 1839 27 +++++ +++ 5178 15 1750 25 12699 99 3647 28
quota2 16 1755 26 +++++ +++ 5208 15 1739 25 12570 99 3581 27
quota2log 16 9227 94 +++++ +++ +++++ +++ 9957 99 12686 99 23035 100
quota2log 16 9807 99 +++++ +++ +++++ +++ 9252 92 12649 99 23301 99
quota2log 16 9789 99 +++++ +++ +++++ +++ 9263 93 12682 99 23032 99
--
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
NetBSD: 26 ans d'experience feront toujours la difference
--
Home |
Main Index |
Thread Index |
Old Index