Port-arm archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Missing gcc arm optimization?
Hi,
I've recently been optimizing the code on my arm-intr branch and was
looking at the compiler output and notice it's doing something
sub-optimal. On the branch the splx routine looks like:
static inline void __attribute__((__unused__))
arm_intr_splx(int newspl)
{
/* look for interrupts at the next ipl or higher */
uint32_t iplvalue = (2 << newspl);
uint32_t oldirqstate;
/*
* disable interrupts so that if one occurs between the compare
* and the set it will be processed
*/
oldirqstate = disable_interrupts(I32_bit);
if (ipls_pending < iplvalue)
current_ipl_level = newspl;
else
arm_intr_splx_lifter(newspl);
restore_interrupts(oldirqstate);
return;
}
I also have an _spllower routine:
int
_spllower(int ipl)
{
return (arm_intr_spllower(ipl));
}
arm_intr_splllower looks like:
static inline int __attribute__((__unused__))
arm_intr_spllower(int ipl)
{
int old = current_ipl_level;
arm_intr_splx(ipl);
return(old);
}
Looking at the disassembly of the code it seems that the call to the
arm_intr_splx_lifter is suboptimal, and could be moved back to the main
code path. Currently the code ends up taking 3 branches, rather than
just the 1:
f01add94 <_spllower>:
f01add94: e1a0c00d mov ip, sp
f01add98: e92dd830 stmdb sp!, {r4, r5, fp, ip, lr, pc}
f01add9c: e24cb004 sub fp, ip, #4 ; 0x4
f01adda0: e59fc050 ldr ip, [pc, #80] ; f01addf8
<_spllower+0x64>
f01adda4: e1a0e000 mov lr, r0
f01adda8: e59c50f8 ldr r5, [ip, #248]
f01addac: e3a03080 mov r3, #128 ; 0x80
f01addb0: e10f4000 mrs r4, CPSR
f01addb4: e1841003 orr r1, r4, r3
f01addb8: e121f001 msr CPSR_c, r1
f01addbc: e59c20f4 ldr r2, [ip, #244]
f01addc0: e243307e sub r3, r3, #126 ; 0x7e
f01addc4: e1520013 cmp r2, r3, lsl r0
f01addc8: 358c00f8 strcc r0, [ip, #248]
f01addcc: 2a000007 bcs f01addf0 <_spllower+0x5c>
f01addd0: e20420c0 and r2, r4, #192 ; 0xc0
f01addd4: e3a030c0 mov r3, #192 ; 0xc0
f01addd8: e10f0000 mrs r0, CPSR
f01adddc: e1c01003 bic r1, r0, r3
f01adde0: e0211002 eor r1, r1, r2
f01adde4: e121f001 msr CPSR_c, r1
f01adde8: e1a00005 mov r0, r5
f01addec: e89da830 ldmia sp, {r4, r5, fp, sp, pc}
f01addf0: ebffffa8 bl f01adc98 <arm_intr_splx_lifter>
f01addf4: eafffff5 b f01addd0 <_spllower+0x3c>
f01addf8: f025aa44 eornv sl, r5, r4, asr #20
This pattern of triple branching is repeated in pretty much every plx
call (sometimes it's needed to place newspl into r0). The code is
compiled with -current compiler using -O2.
How would I go about finding out why this doesn't get optimised? Or
adding an optimization to fix this?
Thanks,
Chris
Home |
Main Index |
Thread Index |
Old Index