Port-arm archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: aarch64 pmap tweaks for review
Noticeably faster with this patch applied, but I eventually hit this
panic during a pkgsrc bulk build:
[ 2471.3017625] ubc_uiomove_direct: error=14
[ 2471.3017625] ubc_uiomove_direct: error=14
[ 2471.3117632] pid 27587 (as): user write of 98456@0xf9c3401b9000 at 142352 failed: 14
[ 2471.3217665] panic: Trap: Data Abort (EL1): Translation Fault L0 with read access for fffffffdfffff000: pc ffffc0000008da80: opcode f8647863: ldr x3, [x3,x4,lsl #3]
[ 2471.3317638] cpu17: Begin traceback...
[ 2471.3317638] trace fp ffffc008784cc5c0
[ 2471.3417639] fp ffffc008784cc5e0 vpanic() at ffffc000004b262c netbsd:vpanic+0x15c
[ 2471.3517632] fp ffffc008784cc650 panic() at ffffc000004b2724 netbsd:panic+0x44
[ 2471.3517632] fp ffffc008784cc6e0 data_abort_handler() at ffffc0000008c26c netbsd:data_abort_handler+0x4dc
[ 2471.3617726] tf ffffc008784cc760 el1_trap() at ffffc00000088b58 netbsd:el1_trap
[ 2471.3717676] ---- trapframe 0xffffc008784cc760 (304 bytes) ----
[ 2471.3817705] pc=ffffc0000008da80, spsr=0000000060000005
[ 2471.3817705] esr=0000000096000004, far=fffffffdfffff000
[ 2471.3917687] x0=fffffffdfffff000, x1=0000f9c340030000
[ 2471.3917687] x2=ffffc008784ccb08, x3=fffffffdfffff000
[ 2471.4017692] x4=0000000000000000, x5=0000000000200000
[ 2471.4017692] x6=0000000000000001, x7=0000000000000003
[ 2471.4117687] x8=6bc3000f9c33fdf7, x9=0000000000000050
[ 2471.4217694] x10=0000000000000000, x11=0000000000400000
[ 2471.4217694] x12=0000f6ba5a73d000, x13=0000f6ba5a73d000
[ 2471.4317696] x14=0000000000000001, x15=0000000000001002
[ 2471.4317696] x16=0000f6ba5a73fd50, x17=0000f6ba5a6cea74
[ 2471.4417700] x18=0000000000000016, x19=0000f9c340030000
[ 2471.4417700] x20=ffff00009e378f80, x21=0000000000000000
[ 2471.4517694] x22=0000f9c340060000, x23=0000000000000000
[ 2471.4617701] x24=ffffc00000954f68, x25=ffffc00000954af0
[ 2471.4617701] x26=ffffc008784ccb58, x27=ffff00886c76a400
[ 2471.4717698] x28=ffff00009bc25b60, fp=x29=ffffc008784cca90
[ 2471.4717698] lr=x30=ffffc0000008e68c, sp=ffffc008784cca90
[ 2471.4817706] ------------------------------------------------
[ 2471.4817706] fp ffffc008784cca90 _pmap_pte_lookup_bs() at ffffc0000008da80 netbsd:_pmap_pte_lookup_bs+0x68
[ 2471.4917705] fp ffffc008784ccaa0 _pmap_remove() at ffffc0000008e688 netbsd:_pmap_remove+0x80
[ 2471.5017713] fp ffffc008784ccb10 pmap_remove() at ffffc00000090b90 netbsd:pmap_remove+0x128
[ 2471.5117697] fp ffffc008784ccb60 uvm_unmap_remove() at ffffc000004222d8 netbsd:uvm_unmap_remove+0x258
[ 2471.5217700] fp ffffc008784ccbe0 uvmspace_free() at ffffc00000423010 netbsd:uvmspace_free+0xc8
[ 2471.5317697] fp ffffc008784ccc10 exit1() at ffffc00000457404 netbsd:exit1+0x174
[ 2471.5417700] fp ffffc008784ccd00 sigexit() at ffffc0000047d030 netbsd:sigexit+0x1e8
[ 2471.5417700] fp ffffc008784ccd50 postsig() at ffffc0000047d488 netbsd:postsig+0x280
[ 2471.5517694] fp ffffc008784cce20 lwp_userret() at ffffc00000461d60 netbsd:lwp_userret+0x1a8
[ 2471.5617691] fp ffffc008784cce70 trap_el0_sync() at ffffc0000008b5f8 netbsd:trap_el0_sync+0x448
[ 2471.5717692] tf ffffc008784cced0 el0_trap() at ffffc00000088bc4 netbsd:el0_trap
[ 2471.5817690] ---- trapframe 0xffffc008784cced0 (304 bytes) ----
[ 2471.5817690] pc=0000fffff1e0df04, spsr=0000000020000000
[ 2471.5917689] esr=0000000092000001, far=0000f9c340278000
[ 2471.5917689] x0=00000002001885f0, x1=0000f9c340278000
[ 2471.6017687] x2=0000000000037a60, x3=0000f9c3401e0000
[ 2471.6017687] x4=0000000000000000, x5=0000fffff1e22000
[ 2471.6117688] x6=00000002001950e0, x7=0000000001a742fc
[ 2471.6217685] x8=0000000000000000, x9=0000000000000000
[ 2471.6217685] x10=0000000000000007, x11=0000000000000001
[ 2471.6317686] x12=000003e70d00b4c0, x13=000003e70d00b4c5
[ 2471.6317686] x14=0000000000000040, x15=0000f9c3402d3150
[ 2471.6417686] x16=0000000000000150, x17=0000000000000000
[ 2471.6417686] x18=0000000000011e88, x19=0000000200101dc5
[ 2471.6517684] x20=0000000200102760, x21=0000000005854375
[ 2471.6617684] x22=0000000000000018, x23=00000002001885f0
[ 2471.6617684] x24=0000fffff1e0e428, x25=0000000000044660
[ 2471.6717684] x26=0000f9c3402ce400, x27=0000f9c3402ce000
[ 2471.6717684] x28=0000000000000000, fp=x29=0000000000000000
[ 2471.6817682] lr=x30=0000fffff1e04d38, sp=0000ffffff8a21b0
[ 2471.6817682] ------------------------------------------------
[ 2471.6917682] cpu17: End traceback...
Stopped in pid 27587.27587 (as) at netbsd:cpu_Debugger+0x4:
ret
On Mon, 1 Jun 2020, Andrew Doran wrote:
Hi,
I made some tweaks to the aarch64 pmap based on lessons learned in the x86
pmap recently. They reduce memory consumption and speed up things like
fork/exec/exit/UBC a little:
http://www.netbsd.org/~ad/2020/aarch64.diff
Approximate times for kernel build on RK3399 with all 6 cores running at
600MHz:
before 1354.07s real 6092.55s user 1591.35s system
after 1307.90s real 6026.60s user 1432.83s system
Description below. Comments welcome.
Thanks,
Andrew
- Fix a lock order reversal via pmap_page_protect().
- Align struct pmap to a cache line boundary.
- Move wired/resident count update out from PMAPCOUNTERS ifdef in one place.
It shouldn't depend on it.
- Make sure pmap is always locked when updating stats. Then atomics are no
longer needed to update stats.
- Remove unneeded traversal of PV lists in pmap_enter_pv().
- Shrink struct vm_page from 136 to 128 bytes (cache line sized - reduce
cache misses).
- Shrink struct pv_entry from 48 to 32 bytes (power of 2 sized - reduce cache
misses).
- Embed a pv_entry in each vm_page. That means PV entries don't need to be
allocated for pages that are mapped only once, for example private
anonymous memory / COW pages / most UBC mappings. Dynamic PV entries are
then used only for stuff like shared libraries and shared memory.
- Comment out PMAPCOUNTERS option because global counters are costly on MP
due to cache coherency overhead. The problem gets exponentially worse as
more CPUs are added.
- Use the pmap as a source of pre-zeroed pages for the VM system.
- Do unlocked checks in pmap_page_protect() and pmap_clear_modify(): avoid
taking the lock if the page has no mappings.
Home |
Main Index |
Thread Index |
Old Index