Port-arm archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: CVS commit: src/sys/arch/arm/ep93xx



Joerg Sonnenberger wrote:
On Tue, May 27, 2008 at 01:25:38AM +0900, HAMAJIMA Katsuomi wrote:
Are you sure that GCC doesn't do exactly that? For unsigned arithmetic,
GCC will normally use unsigned mul + shift and not a division. It would
be strongly prefered to not have inline assembly here.
I do not understand your opinion, sorry. I attach disassembled delay().
Please tell me details.

Hm. My MIPS is very basic, but it seems like it generates external calls
for the division and does not optimise it as it should. Can someone
confirm that?

heh, that's actually arm assembler, not MIPS :) They do look very similar though.

Because arm is softfloat it's expected that it will carry out division in software.

However, the issue is actually the default gcc mode is an arm processor without umull. The kernel config should add:
makeoptions     CPUFLAGS="-march=armv4 -mtune=arm9"

Or perhaps that should be in files.ep93xx. It would then benefit the whole kernel.

This causes the code and gcc output to be as attached, IE using umull, and some shifts.

One thing I do note is that the comment about reading TIMER4VAL early to allow it to count the overhead, and the actual read point in the asm aren't matched up. I guess the compiler is optimizing the execution path, some kind of barrier is probably needed to make the TIMER4VAL stick where it was called.

Thanks,
Chris





void
delay(unsigned int n)
{
        unsigned int cur_tick, initial_tick;
        int remaining;

#ifdef DEBUG
        if (epclk_sc == NULL) {
                printf("delay: called before start epclk\n");
                return;
        }
#endif

        /*
         * Read the counter first, so that the rest of the setup overhead is
         * counted.
         */
        initial_tick = TIMER4VAL();

        remaining = n * TIMER_FREQ / 1000000;

        while (remaining > 0) {
                cur_tick = TIMER4VAL();
                if (cur_tick >= initial_tick)
                        remaining -= cur_tick - initial_tick;
                else
                        remaining -= UINT_MAX - initial_tick + cur_tick + 1;
                initial_tick = cur_tick;
        }
}

delay:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        mov     r3, r0, asl #20
        ldr     r2, .L14
        sub     r3, r3, r0, asl #16
        umull   r0, r1, r2, r3
        str     lr, [sp, #-4]!
        mov     r1, r1, lsr #18
        ldr     r3, .L14+4
        cmp     r1, #0
        ldr     r0, [r3, #96]
        ldrle   pc, [sp], #4
        mov     lr, r3
.L7:
        ldr     r2, [lr, #96]
        mvn     r3, r0
        add     r3, r2, r3
        cmp     r2, r0
        rsb     ip, r0, r2
        rsb     r3, r3, r1
        subcc   r1, r3, #1
        rsbcs   r1, ip, r1
        cmp     r1, #0
        mov     r0, r2
        bgt     .L7
        ldr     pc, [sp], #4
.L15:
        .align  2
.L14:
        .word   1125899907
        .word   -267321344
        .size   delay, .-delay
        .align  2


Home | Main Index | Thread Index | Old Index