Subject: sparc64 pmap optimizations
To: None <port-sparc64@netbsd.org, tech-kern@netbsd.org>
From: Chuck Silvers <chuq@chuq.com>
List: port-sparc64
Date: 08/25/2002 21:46:18
hi folks,
as a warm-up to working on sparc64 MP support, I spent some time
improving the pmap. the two cases I was looking at in particular
were fork+exit and fork+exec+exit. the results are:
fork+exit
orig 0.025u 0.979s 0:09.22 10.7% 0+0k 0+1io 0pf+0w
chuq 0.014u 0.674s 0:07.51 9.0% 0+0k 0+0io 0pf+0w
improvement 18.5%
for+exec+exit
orig 0.076u 2.232s 0:07.57 30.3% 0+0k 0+3io 0pf+0w
chuq 0.137u 1.609s 0:05.92 29.2% 0+0k 0+1io 0pf+0w
improvement 21.8%
the diffs are available for review at
ftp://ftp.netbsd.org/pub/NetBSD/misc/chs/sparc64pmap/
(as per my usual patch distribution method, apply the latest patch
to -current as of that date).
the MI changes are:
- there's a new optional pmap interface:
void pmap_predestroy(struct pmap *)
this is a hint to the pmap layer that this pmap will be destroyed soon,
and that the only operations that will be performed on it before then
are pmap_remove()s. the pmap layer indicates the availability of this
interface to UVM by defining __HAVE_PMAP_PREDESTROY in <machine/pmap.h>.
- there's an optional semantic change to pmap_activate():
I've got uvmspace_exec() switching to the kernel's pmap temporarily
in order to speed up tearing down the process's old pmap by calling
pmap_activate(&proc0). however, many pmaps cannot handle this.
a pmap can indicate that it's willing to be called in this context
by defining __HAVE_PMAP_ACTIVATE_KERNEL in <machine/pmap.h>.
- we cache a few free kernel stacks (currently 16) to avoid cache flushing
and mucking with kernel_map. there's currently no feedback mechanism
for forcing these to be freed. I'm imagining some kind of registration
mechanism for the pagedaemon to call back to the various subsystems that
might be hanging on to significant memory that could be freed easily.
here's the overall list of changes:
- use struct vm_page_md for attaching pv entries to struct vm_page
- change pseg_set()'s return value to indicate whether the spare page
was used as an L2 or L3 PTP.
- use a pool for pv entries instead of malloc().
- put PTPs on a list attached to the pmap so we can free them
more efficiently (by just walking the list) in pmap_destroy().
- use the new pmap_predestroy() interface to avoid flushing the cache and TLB
for each pmap_remove() that's done as we are tearing down an address space.
- in pmap_enter(), handle replacing an existing mapping more efficiently
than just calling pmap_remove() on it. also, skip flushing the
TSB and TLB if there was no previous mapping, since there can't be
anything we need to flush.
- allocate hardware contexts like the MIPS pmap:
allocate them all sequentially without reuse, then once we run out
just invalidate all user TLB entries and flush the entire CPU cache.
- fix pmap_extract() for the case where the va is not page-aligned and
nothing is mapped there.
- fix calculation of TSB size. it was comparing physmem (which is
in units of pages) to constants that only make sense if they are
in units of bytes.
- remove code to handle impossible cases in various functions.
- tweak asm code to pipeline a little better.
- remove a few unnecessary spls.
could people try out these changes and let me know how it goes?
I did my testing on an ultra2.
also, feedback on the changes in general would be great.
-Chuck