Port-arm archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: aarch64 pmap tweaks for review
Hi,
I have committed this change.
On Thu, Jun 04, 2020 at 09:22:15AM +0100, Nick Hudson wrote:
> using the low 12 bits of pv_va is "common", cf
Done!
On Tue, Jun 02, 2020 at 09:47:08AM -0300, Jared McNeill wrote:
> Noticeably faster with this patch applied, but I eventually hit this panic
> during a pkgsrc bulk build:
This should be fixed now. I think it was a page being freed as PG_ZERO when
it was not totally zeroed.
Thanks,
Andrew
> [ 2471.3017625] ubc_uiomove_direct: error=14
> [ 2471.3017625] ubc_uiomove_direct: error=14
> [ 2471.3117632] pid 27587 (as): user write of 98456@0xf9c3401b9000 at 142352 failed: 14
> [ 2471.3217665] panic: Trap: Data Abort (EL1): Translation Fault L0 with read access for fffffffdfffff000: pc ffffc0000008da80: opcode f8647863: ldr x3, [x3,x4,lsl #3]
>
> [ 2471.3317638] cpu17: Begin traceback...
> [ 2471.3317638] trace fp ffffc008784cc5c0
> [ 2471.3417639] fp ffffc008784cc5e0 vpanic() at ffffc000004b262c netbsd:vpanic+0x15c
> [ 2471.3517632] fp ffffc008784cc650 panic() at ffffc000004b2724 netbsd:panic+0x44
> [ 2471.3517632] fp ffffc008784cc6e0 data_abort_handler() at ffffc0000008c26c netbsd:data_abort_handler+0x4dc
> [ 2471.3617726] tf ffffc008784cc760 el1_trap() at ffffc00000088b58 netbsd:el1_trap
> [ 2471.3717676] ---- trapframe 0xffffc008784cc760 (304 bytes) ----
> [ 2471.3817705] pc=ffffc0000008da80, spsr=0000000060000005
> [ 2471.3817705] esr=0000000096000004, far=fffffffdfffff000
> [ 2471.3917687] x0=fffffffdfffff000, x1=0000f9c340030000
> [ 2471.3917687] x2=ffffc008784ccb08, x3=fffffffdfffff000
> [ 2471.4017692] x4=0000000000000000, x5=0000000000200000
> [ 2471.4017692] x6=0000000000000001, x7=0000000000000003
> [ 2471.4117687] x8=6bc3000f9c33fdf7, x9=0000000000000050
> [ 2471.4217694] x10=0000000000000000, x11=0000000000400000
> [ 2471.4217694] x12=0000f6ba5a73d000, x13=0000f6ba5a73d000
> [ 2471.4317696] x14=0000000000000001, x15=0000000000001002
> [ 2471.4317696] x16=0000f6ba5a73fd50, x17=0000f6ba5a6cea74
> [ 2471.4417700] x18=0000000000000016, x19=0000f9c340030000
> [ 2471.4417700] x20=ffff00009e378f80, x21=0000000000000000
> [ 2471.4517694] x22=0000f9c340060000, x23=0000000000000000
> [ 2471.4617701] x24=ffffc00000954f68, x25=ffffc00000954af0
> [ 2471.4617701] x26=ffffc008784ccb58, x27=ffff00886c76a400
> [ 2471.4717698] x28=ffff00009bc25b60, fp=x29=ffffc008784cca90
> [ 2471.4717698] lr=x30=ffffc0000008e68c, sp=ffffc008784cca90
> [ 2471.4817706] ------------------------------------------------
> [ 2471.4817706] fp ffffc008784cca90 _pmap_pte_lookup_bs() at ffffc0000008da80 netbsd:_pmap_pte_lookup_bs+0x68
> [ 2471.4917705] fp ffffc008784ccaa0 _pmap_remove() at ffffc0000008e688 netbsd:_pmap_remove+0x80
> [ 2471.5017713] fp ffffc008784ccb10 pmap_remove() at ffffc00000090b90 netbsd:pmap_remove+0x128
> [ 2471.5117697] fp ffffc008784ccb60 uvm_unmap_remove() at ffffc000004222d8 netbsd:uvm_unmap_remove+0x258
> [ 2471.5217700] fp ffffc008784ccbe0 uvmspace_free() at ffffc00000423010 netbsd:uvmspace_free+0xc8
> [ 2471.5317697] fp ffffc008784ccc10 exit1() at ffffc00000457404 netbsd:exit1+0x174
> [ 2471.5417700] fp ffffc008784ccd00 sigexit() at ffffc0000047d030 netbsd:sigexit+0x1e8
> [ 2471.5417700] fp ffffc008784ccd50 postsig() at ffffc0000047d488 netbsd:postsig+0x280
> [ 2471.5517694] fp ffffc008784cce20 lwp_userret() at ffffc00000461d60 netbsd:lwp_userret+0x1a8
> [ 2471.5617691] fp ffffc008784cce70 trap_el0_sync() at ffffc0000008b5f8 netbsd:trap_el0_sync+0x448
> [ 2471.5717692] tf ffffc008784cced0 el0_trap() at ffffc00000088bc4 netbsd:el0_trap
> [ 2471.5817690] ---- trapframe 0xffffc008784cced0 (304 bytes) ----
> [ 2471.5817690] pc=0000fffff1e0df04, spsr=0000000020000000
> [ 2471.5917689] esr=0000000092000001, far=0000f9c340278000
> [ 2471.5917689] x0=00000002001885f0, x1=0000f9c340278000
> [ 2471.6017687] x2=0000000000037a60, x3=0000f9c3401e0000
> [ 2471.6017687] x4=0000000000000000, x5=0000fffff1e22000
> [ 2471.6117688] x6=00000002001950e0, x7=0000000001a742fc
> [ 2471.6217685] x8=0000000000000000, x9=0000000000000000
> [ 2471.6217685] x10=0000000000000007, x11=0000000000000001
> [ 2471.6317686] x12=000003e70d00b4c0, x13=000003e70d00b4c5
> [ 2471.6317686] x14=0000000000000040, x15=0000f9c3402d3150
> [ 2471.6417686] x16=0000000000000150, x17=0000000000000000
> [ 2471.6417686] x18=0000000000011e88, x19=0000000200101dc5
> [ 2471.6517684] x20=0000000200102760, x21=0000000005854375
> [ 2471.6617684] x22=0000000000000018, x23=00000002001885f0
> [ 2471.6617684] x24=0000fffff1e0e428, x25=0000000000044660
> [ 2471.6717684] x26=0000f9c3402ce400, x27=0000f9c3402ce000
> [ 2471.6717684] x28=0000000000000000, fp=x29=0000000000000000
> [ 2471.6817682] lr=x30=0000fffff1e04d38, sp=0000ffffff8a21b0
> [ 2471.6817682] ------------------------------------------------
> [ 2471.6917682] cpu17: End traceback...
> Stopped in pid 27587.27587 (as) at netbsd:cpu_Debugger+0x4: ret
>
> On Mon, 1 Jun 2020, Andrew Doran wrote:
>
> > Hi,
> >
> > I made some tweaks to the aarch64 pmap based on lessons learned in the x86
> > pmap recently. They reduce memory consumption and speed up things like
> > fork/exec/exit/UBC a little:
> >
> > http://www.netbsd.org/~ad/2020/aarch64.diff
> >
> > Approximate times for kernel build on RK3399 with all 6 cores running at
> > 600MHz:
> >
> > before 1354.07s real 6092.55s user 1591.35s system
> > after 1307.90s real 6026.60s user 1432.83s system
> >
> > Description below. Comments welcome.
> >
> > Thanks,
> > Andrew
> >
> > - Fix a lock order reversal via pmap_page_protect().
> >
> > - Align struct pmap to a cache line boundary.
> >
> > - Move wired/resident count update out from PMAPCOUNTERS ifdef in one place.
> > It shouldn't depend on it.
> >
> > - Make sure pmap is always locked when updating stats. Then atomics are no
> > longer needed to update stats.
> >
> > - Remove unneeded traversal of PV lists in pmap_enter_pv().
> >
> > - Shrink struct vm_page from 136 to 128 bytes (cache line sized - reduce
> > cache misses).
> >
> > - Shrink struct pv_entry from 48 to 32 bytes (power of 2 sized - reduce cache
> > misses).
> >
> > - Embed a pv_entry in each vm_page. That means PV entries don't need to be
> > allocated for pages that are mapped only once, for example private
> > anonymous memory / COW pages / most UBC mappings. Dynamic PV entries are
> > then used only for stuff like shared libraries and shared memory.
> >
> > - Comment out PMAPCOUNTERS option because global counters are costly on MP
> > due to cache coherency overhead. The problem gets exponentially worse as
> > more CPUs are added.
> >
> > - Use the pmap as a source of pre-zeroed pages for the VM system.
> >
> > - Do unlocked checks in pmap_page_protect() and pmap_clear_modify(): avoid
> > taking the lock if the page has no mappings.
> >
> >
Home |
Main Index |
Thread Index |
Old Index