Subject: Re: Porting Hammerfs (fwd)
To: Bill Stouder-Studenmund <wrstuden@netbsd.org>
From: Matthew Dillon <dillon@apollo.backplane.com>
List: tech-kern
Date: 12/11/2007 11:56:15
:I read that paragraph differently. I don't disagree that it could well be
:
:a gotcha about porting, but I took it as meaning they didn't have to roll
:
:a custom cache for their metadata stuff. They have btrees, so they have0
:more complicated in-core structures than say ffs does.
:
:I could be wrong though...
:
:Take care,
:
:Bill
Yes, that's it exactly. B-Tree's can be reasonably well cached but
you can still wind up with a lot of code overhead if you have to
dive into the OS and issue a lookup (bread() in the case of DFly/FBsd)
every time you want to access a B-Tree node. e.g. having to scan a
6-deep B-Tree might require 6 bread()s JUST to find B-Tree element,
then another bread() to access the data. FFS requires maybe 1/3 the
bread() calls to access the same data.
HAMMER solves this problem by maintaining in-memory tracking structures
for things like B-Tree nodes and those structures cache a pointer to the
actual on-disk data, by pointing directly into the related buffer
cache buffer. The pointers are maintained even after HAMMER is
through with an operation. HAMMER then relies on the OS to tell it when
a buffer cache buffer should be flushed/recycled. If the in-memory
tracking structures are in-use (have a non-zero ref count), HAMMER
sets B_LOCKED which tells the OS not to throw away the buffer. If
the in-memory tracking structures are not in-use (ref count == 0),
HAMMER disassociates the buffer cache buffer from the structure(s)
and allows the OS to proceed. This removes nearly all the bread()
calls from the critical path.
HAMMER also relies heavily on caching work in-memory and in the
OS's buffer or VM page cache, in order to be able to flush the work
out in larger chunks. Ultimately I hope to have one B-Tree element
represent potentially huge swaths of data instead of one 16K chunk.
This ultimately is what will make HAMMER seek-efficient. A B-Tree
element in HAMMER is around 64 bytes verses the 4 (or 8) bytes FFS needs
to represent a pointer to a disk block. The more data that B-Tree
element can represent, the better.
That all said, I don't think it would be hard to port the buffer cache
aspects of HAMMER. I have most of the buffer cache ops isolated in
a single support file specifically to make porting easier.
In anycase, I really appreciate the interest. I hope to have things
in better shape by mid-January.
-Matt