Missing gcc arm optimization?

To: port-arm%netbsd.org@localhost
Subject: Missing gcc arm optimization?
From: Chris Gilbert <chris%dokein.co.uk@localhost>
Date: Sat, 09 Feb 2008 17:12:02 +0000

Hi,

I've recently been optimizing the code on my arm-intr branch and waslooking at the compiler output and notice it's doing somethingsub-optimal. On the branch the splx routine looks like:

static inline void __attribute__((__unused__))
arm_intr_splx(int newspl)
{
   /* look for interrupts at the next ipl or higher */
   uint32_t iplvalue = (2 << newspl);
   uint32_t oldirqstate;

   /*
    * disable interrupts so that if one occurs between the compare
    * and the set it will be processed
    */
   oldirqstate = disable_interrupts(I32_bit);

   if (ipls_pending < iplvalue)
       current_ipl_level = newspl;
   else
       arm_intr_splx_lifter(newspl);

   restore_interrupts(oldirqstate);

   return;
}

I also have an _spllower routine:
int
_spllower(int ipl)
{
   return (arm_intr_spllower(ipl));
}

arm_intr_splllower looks like:
static inline int __attribute__((__unused__))
arm_intr_spllower(int ipl)
{
       int old = current_ipl_level;

       arm_intr_splx(ipl);
       return(old);
}

Looking at the disassembly of the code it seems that the call to thearm_intr_splx_lifter is suboptimal, and could be moved back to the maincode path. Currently the code ends up taking 3 branches, rather thanjust the 1:


f01add94 <_spllower>:
f01add94:       e1a0c00d        mov     ip, sp
f01add98:       e92dd830        stmdb   sp!, {r4, r5, fp, ip, lr, pc}
f01add9c:       e24cb004        sub     fp, ip, #4      ; 0x4

f01adda0: e59fc050 ldr ip, [pc, #80] ; f01addf8<_spllower+0x64>

f01adda4:       e1a0e000        mov     lr, r0
f01adda8:       e59c50f8        ldr     r5, [ip, #248]
f01addac:       e3a03080        mov     r3, #128        ; 0x80
f01addb0:       e10f4000        mrs     r4, CPSR
f01addb4:       e1841003        orr     r1, r4, r3
f01addb8:       e121f001        msr     CPSR_c, r1
f01addbc:       e59c20f4        ldr     r2, [ip, #244]
f01addc0:       e243307e        sub     r3, r3, #126    ; 0x7e
f01addc4:       e1520013        cmp     r2, r3, lsl r0
f01addc8:       358c00f8        strcc   r0, [ip, #248]
f01addcc:       2a000007        bcs     f01addf0 <_spllower+0x5c>
f01addd0:       e20420c0        and     r2, r4, #192    ; 0xc0
f01addd4:       e3a030c0        mov     r3, #192        ; 0xc0
f01addd8:       e10f0000        mrs     r0, CPSR
f01adddc:       e1c01003        bic     r1, r0, r3
f01adde0:       e0211002        eor     r1, r1, r2
f01adde4:       e121f001        msr     CPSR_c, r1
f01adde8:       e1a00005        mov     r0, r5
f01addec:       e89da830        ldmia   sp, {r4, r5, fp, sp, pc}
f01addf0:       ebffffa8        bl      f01adc98 <arm_intr_splx_lifter>
f01addf4:       eafffff5        b       f01addd0 <_spllower+0x3c>
f01addf8:       f025aa44        eornv   sl, r5, r4, asr #20

This pattern of triple branching is repeated in pretty much every plxcall (sometimes it's needed to place newspl into r0). The code iscompiled with -current compiler using -O2.

How would I go about finding out why this doesn't get optimised? Oradding an optimization to fix this?


Thanks,
Chris

Follow-Ups:
- Re: Missing gcc arm optimization?
  - From: Steve Woodford

Prev by Date: Re: Booting NSLU2 without using the serial port
Next by Date: Re: Missing gcc arm optimization?
Previous by Thread: XFree86 for 4.0 has dead keys?
Next by Thread: Re: Missing gcc arm optimization?
Indexes:

Home | Main Index | Thread Index | Old Index