Port-arm archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Missing gcc arm optimization?



Hi,

I've recently been optimizing the code on my arm-intr branch and was looking at the compiler output and notice it's doing something sub-optimal. On the branch the splx routine looks like:
static inline void __attribute__((__unused__))
arm_intr_splx(int newspl)
{
   /* look for interrupts at the next ipl or higher */
   uint32_t iplvalue = (2 << newspl);
   uint32_t oldirqstate;

   /*
    * disable interrupts so that if one occurs between the compare
    * and the set it will be processed
    */
   oldirqstate = disable_interrupts(I32_bit);

   if (ipls_pending < iplvalue)
       current_ipl_level = newspl;
   else
       arm_intr_splx_lifter(newspl);

   restore_interrupts(oldirqstate);

   return;
}

I also have an _spllower routine:
int
_spllower(int ipl)
{
   return (arm_intr_spllower(ipl));
}

arm_intr_splllower looks like:
static inline int __attribute__((__unused__))
arm_intr_spllower(int ipl)
{
       int old = current_ipl_level;

       arm_intr_splx(ipl);
       return(old);
}

Looking at the disassembly of the code it seems that the call to the arm_intr_splx_lifter is suboptimal, and could be moved back to the main code path. Currently the code ends up taking 3 branches, rather than just the 1:

f01add94 <_spllower>:
f01add94:       e1a0c00d        mov     ip, sp
f01add98:       e92dd830        stmdb   sp!, {r4, r5, fp, ip, lr, pc}
f01add9c:       e24cb004        sub     fp, ip, #4      ; 0x4
f01adda0: e59fc050 ldr ip, [pc, #80] ; f01addf8 <_spllower+0x64>
f01adda4:       e1a0e000        mov     lr, r0
f01adda8:       e59c50f8        ldr     r5, [ip, #248]
f01addac:       e3a03080        mov     r3, #128        ; 0x80
f01addb0:       e10f4000        mrs     r4, CPSR
f01addb4:       e1841003        orr     r1, r4, r3
f01addb8:       e121f001        msr     CPSR_c, r1
f01addbc:       e59c20f4        ldr     r2, [ip, #244]
f01addc0:       e243307e        sub     r3, r3, #126    ; 0x7e
f01addc4:       e1520013        cmp     r2, r3, lsl r0
f01addc8:       358c00f8        strcc   r0, [ip, #248]
f01addcc:       2a000007        bcs     f01addf0 <_spllower+0x5c>
f01addd0:       e20420c0        and     r2, r4, #192    ; 0xc0
f01addd4:       e3a030c0        mov     r3, #192        ; 0xc0
f01addd8:       e10f0000        mrs     r0, CPSR
f01adddc:       e1c01003        bic     r1, r0, r3
f01adde0:       e0211002        eor     r1, r1, r2
f01adde4:       e121f001        msr     CPSR_c, r1
f01adde8:       e1a00005        mov     r0, r5
f01addec:       e89da830        ldmia   sp, {r4, r5, fp, sp, pc}
f01addf0:       ebffffa8        bl      f01adc98 <arm_intr_splx_lifter>
f01addf4:       eafffff5        b       f01addd0 <_spllower+0x3c>
f01addf8:       f025aa44        eornv   sl, r5, r4, asr #20

This pattern of triple branching is repeated in pretty much every plx call (sometimes it's needed to place newspl into r0). The code is compiled with -current compiler using -O2.

How would I go about finding out why this doesn't get optimised? Or adding an optimization to fix this?

Thanks,
Chris



Home | Main Index | Thread Index | Old Index