Userland is pageable, so when mmap is called with one page, the kernel
does not
yet make the page officially available to the CPU. Rather, it waits for
the page
to fault, and at fault time it will make it valid for real. It means
the kernel
code path from the interrupt to the moment when the page is entered
needs to be
fast.
All this to say that in pmap_enter_ma on x86, an optimization is
possible. In
this function, new_pve and new_sparepve are always allocated, but not
always
needed. The reason it is done this way is because preemption is
disabled in the
critical part, so obviously the allocation needs to be performed
earlier.
new_pve and new_sparepve are to be used in pmap_enter_pv. After adding
atomic
counters in here, a './build.sh tools' gives these numbers:
PVE: used=36441394 unused=58955001
SPAREPVE: used=1647254 unused=93749141
It means that 38088648 allocations were needed and performed, and
152704142
performed but not used. In short, only 19% of the allocated buffers
were needed.
Verily, the real number may be even smaller than that, since I didn't
take into
account the fact that there may be no p->v tracking at all (in which
case both
buffers would be unused as well).
I have a patch which introduces two inlined functions that can tell
earlier
whether these buffers are needed. One problem with this patch is that
it makes
the code harder to understand, even though I tried to explain clearly
what we
are doing. Another problem is that when both buffers are needed, my
patch
introduces a little overhead (the cost of a few branches).
I don't know if we care enough about things like that, if someone here
has
particular comments feel free.
[1] https://nxr.netbsd.org/xref/src/sys/arch/x86/x86/pmap.c#4061
[2] http://m00nbsd.net/garbage/pmap/enter.diff