Port-arm archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: aarch64 performance tweaks
(continued)
>- Have unlikely conditional branches go forwards to help the static branch
> predictor.
This is very interesting, but I wonder how effective it is in practice.
The function itself is small enough that it may be more effective to align functions to cachesize.
(we might want to try CFLAGS+=-falign-functions=... and increase _ALIGN_TEXT)
>- Use tpidr_el1 to hold curlwp and not curcpu, because curlwp is accessed
> much more often by MI code. It also makes curlwp preemption safe and
> allows aarch64_curlwp() to be a const function (curcpu must be volatile).
BTW, n/aarch64 uses tpidr_*el0* as userland's TLS, and it was saved to l_private at excption and restored from l_private at eret.
Therefore, tpidr_el0 can be used freely in the kernel context, we might be able to use it as curcpu.
( Of course, to do this we need to "tpidr_el0 = curlwp->l_cpu" in el0_trap and lwp_trampoline )
However, I'm not sure how effective this is compared to using curlwp->l_cpu... :-P
Others, looks so good to me. Thank you for a great job!
--
ryo shimizu
Home |
Main Index |
Thread Index |
Old Index