Subject: Re: arm32 pmap changes
To: None <port-arm32@netbsd.org>
From: Chris Gilbert <chris@paradox.demon.co.uk>
List: port-arm32
Date: 06/25/2001 00:29:33
Just to update where I'm upto.
On Friday 22 June 2001 12:47 am, Chris Gilbert wrote:
[snip]
> Over the next few days I'm planning:
> use a pool for the pmap structs, should improve performance (will benchmark
> to confirm this)
done.
> clean up pmap struct, (has a couple of seemingly dead entries in it,
> pm_unused1 and pm_dref, need to verify they really are dead (cats thinks
> they're dead though)
done.
> implement pmap_map_ptes and unmap_ptes. This is based on Richard's
> version. I plan to use if for pmap_remove initially. Expanding it into
> pmap_enter and vac_me_harder.
done.
I still need to make pmap_enter use map_ptes (and that'll cur it's time down
a bit as well)
Profiling now shows less calls to pmap_pte, and somehow pmap_map_ptes is
actually faster than pmap_pte as well. (note that remrunqueue is actually
including the idle loop as well, hence the large amount of time in it ;)
Currently it actually looks like I should look into reducing the number of
calls to splx, we call splvm a hell of a lot in the pmap, I might look at the
locking down in the i386 version see if we can replace the splvm's with it.
Another major gain would be to sort out pmap_release so it doesn't have to
walk the whole of a the L1 table looking for items to free off (we should do
that in pmap_remove)
Note that the profile is from doing a time make configure of gmake.
Another optimisation (again something richard suggested) is that we should
zero pages when idling, means they can be allocated faster :)
Cheers,
Chris
Profiling now shows:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls us/call us/call name
27.76 32.57 32.57 _mcount
7.42 41.28 8.71 mcount
4.41 46.45 5.17 246628 20.96 106.35 uvm_fault
3.31 50.33 3.88 14 277142.86 417857.14 remrunqueue
3.22 54.11 3.78 42211 89.55 89.55 bcopy_page
2.80 57.40 3.29 2331473 1.41 1.50 splx
2.63 60.49 3.09 21616 142.95 142.95 sa110_cache_purgeID
2.49 63.41 2.92 1086199 2.69 2.69 lockmgr
2.42 66.25 2.84 228041 12.45 21.20 data_abort_handler
2.03 68.63 2.38 2313370 1.03 1.12 raisespl
1.82 70.77 2.14 459854 4.65 10.14 pmap_enter
1.72 72.79 2.02 SetCPSR
1.68 74.76 1.97 34468917 0.06 0.06 cpufunc_nullop
1.64 76.68 1.92 827958 2.32 2.32 pmap_vac_me_harder
1.62 78.58 1.90 45560 41.70 41.70 bzero_page
1.58 80.43 1.85 480131 3.85 3.85 uvm_pageactivate
1.33 81.99 1.56 84942 18.37 18.37 memset
1.30 83.52 1.53 91499 16.72 18.11 uvm_pagealloc_strat
1.24 84.98 1.46 136958 10.66 491.69 syscall
1.21 86.40 1.42 37614 37.75 162.21 pmap_remove
1.08 87.67 1.27 768200 1.65 1.65 pmap_pte
0.94 88.77 1.10 113404 9.70 9.70 copyout
0.90 89.83 1.06 133692 7.93 7.93 sa110_cache_purgeD_rng
0.78 90.74 0.91 311751 2.92 2.92 _memcpy
0.76 91.63 0.89 1133114 0.79 1.22 pmap_extract
0.74 92.50 0.87 87418 9.95 9.95 sa110_cache_purgeID_rng
0.68 93.30 0.80 90467 8.84 26.76 uvm_pagefree
0.67 94.09 0.79 423348 1.87 1.87 uvm_map_lookup_entry
0.61 94.80 0.71 64469 11.01 20.61 genfs_getpages
0.60 95.50 0.70 1640675 0.43 0.43 pmap_map_ptes
0.53 96.12 0.62 2297041 0.27 0.28 dosoftints
0.53 96.74 0.62 1891 327.87 1302.70 pmap_release
0.50 97.33 0.59 67040 8.80 12.59 prefetch_abort_handler
0.47 97.88 0.55 336049 1.64 2.76 pmap_remove_pv
0.43 98.39 0.51 336938 1.51 2.63 pmap_enter_pv
0.43 98.89 0.50 68225 7.33 7.33 copyoutstr
0.43 99.39 0.50 42211 11.85 139.28 pmap_copy_page
0.42 99.88 0.49 92611 5.29 9.02 malloc
0.38 100.32 0.44 346933 1.27 14.54 uvm_pagelookup
0.37 100.75 0.43 257360 1.67 2.79 pmap_modify_pv
0.37 101.18 0.43 104720 4.11 4.12 pool_get
0.36 101.60 0.42 4560145 0.09 0.09 irq_setmasks
0.36 102.02 0.42 74394 5.65 13.43 cache_lookup
0.35 102.43 0.41 141885 2.89 4.67 pmap_handled_emulation
0.35 102.84 0.41 25434 16.12 559.63 lookup
0.33 103.23 0.39 365387 1.07 1.07 userret
0.31 103.59 0.36 86156 4.18 13.75 pmap_modified_emulation