Subject: Re: MINCLSIZE considered harmful
To: Matthias Scheler <tron@zhadum.de>
From: Jonathan Stone <jonathan@Pescadero.dsg.stanford.edu>
List: tech-net
Date: 08/24/2005 16:47:28
In message <20050824182958.GA21054@colwyn.zhadum.de>Matthias Scheler writes
>
> Hello,
>
>MINCLSIZE is current defined like this in "src/sys/sys/mbuf.h":
>
>#define MINCLSIZE (MHLEN+MLEN+1) /* smallest amount to put in cluster */
>
>Network code code paying attention to this value will therefore create
>a chain of an mbuf packet header and a normal mbuf for certain payload
>sizes instead of using a single mbuf cluster.
[...]
Yes, that's basically correct. MINCLSIZE lets the kernel-builder
choose a time/space tradeoff. Smaller values of MINLCSIZE result in
more contigous buffers, at the cost of more wasted space. It follows
that your Subject: line is not quite right: the existence of MINCLSIZE
as a (crudely) tunable option is a Good Thing, and (I think) the part
you're taking issue with is merely the current default value of
MINCLSIZE, with certain mis-designed NICs or certain hardware :-), not
the existence of MCLSIZE per se.
>This seems to be bad at least for architectures which (usually) use DMA
>capable NICs because the driver either has to use scatter gather or even
>worse copy the data into a continuous buffer. I therefore wonder if we
>should define MINCLSIZE to MHLEN instead on at least architectures
>which are not terribly short on memory, e.g. NetBSD-amd64.
If I recall correctly, the historic (MHLEN + MLEN + 1) dates back to
when default MLEN was 128, and expected memory was much smaller.
Someone who's acutally done quantitative studies once told me that an
even better choce was to provide three sizes of mbuf: one around 128
bytes (this before, er, "legacy" protocols like IPv6); one around 512
bytes, and one for local Ethernet MTU. I beleive the 512-ish bytes
was useful for a variety of NFS requests or responses that were too
big for (MLEN+MHLEN); tho' I may well be recalling data so old that
MLEN was still around 128 bytes. (I vaguely recall that grab 16384
bytes, and breaking it into just-larger-than-Ethernet clusters, plus
one leftover fragments, gave a ready source of intermediate mbufs.)
Again, take all the above with a large grain of salt.
But, I'd guess adding either "intermediate" mbufs or bnon-power-of-two
mbuf-cluster support is more than you'd want to bite off right now?