Subject: pmap tweaking (was Re: Things to work on)
To: None <port-arm32@netbsd.org, port-arm@netbsd.org>
From: Chris Gilbert <chris@paradox.demon.co.uk>
List: port-arm
Date: 06/01/2001 00:40:15
Just to update people.
I've had a play with the pmap stuff. I've managed to get my head around how
most of it works, and what the terminology is. I've managed to get the
./lat_proc fork (from lmbench) time down to half on a cats. down to 8000
microseconds from 16000 microseconds, my PII 333 gets 1000 microseconds.
Note that this is by no means accurate or a real world test, just a show that
something is better (note that this is with a PMAP_DEBUG on, DIAGNOSTICS on,
and a whole pile of other debug stuff on). I'll retest with something more
realistic at some point, eg time make configure for gmake.
One reason for the above is that pmap_release currently scans the whole of
the L1 table for entries, however by using a uvm_object and allocating the L2
tables and associating them with the uvm object, you can walk the
uvm_object's list and free them off :)
However I'm having some issues with it that I need to look into.
I'm also playing with some other tweaks to the code, but I really need to sit
down again and work on making some clear notes, eg how to tell a page is
wired, modified, referenced etc.
I also implemented a pool for the pmap objects, so that we don't keep
allocating and freeing them.
I've also played with getting rid of the static L1 tables, but somehow they
keep ending up getting fragmented after being freeded and reused *sigh* It
suggests some kind of vm leak somewhere. Sadly trying to allocate 4 pages in
a contiguous block will always be a problem. I did consider putting them in
a pool, but that won't work as we need the pglist. One thought I had, but
I'm not sure it'll work is to have the l1pt structs in a pool, and keep the
pglist still in the struct, but I've not had time to examine how the pool
code works to tell if this is pheasible.
Note that you can actually have more than 256 processes, I did run lat_ctx
with 300 processes, and it worked. The issue is getting that 16k of
contigous memory, (I'm also wondering if there's a problem with the Kernel VM
space of finding 16k in that...)
So many things to think on/consider.
If anyone is interested in looking at the stuff I've done so far, let me know
and I'll mail the current diff.
Cheers,
Chris