Source-Changes archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: CVS commit: src/sys/arch/arm/ep93xx
Joerg Sonnenberger wrote:
On Tue, May 27, 2008 at 01:25:38AM +0900, HAMAJIMA Katsuomi wrote:
Are you sure that GCC doesn't do exactly that? For unsigned arithmetic,
GCC will normally use unsigned mul + shift and not a division. It would
be strongly prefered to not have inline assembly here.
I do not understand your opinion, sorry. I attach disassembled delay().
Please tell me details.
Hm. My MIPS is very basic, but it seems like it generates external calls
for the division and does not optimise it as it should. Can someone
confirm that?
heh, that's actually arm assembler, not MIPS :) They do look very
similar though.
Because arm is softfloat it's expected that it will carry out division
in software.
However, the issue is actually the default gcc mode is an arm processor
without umull. The kernel config should add:
makeoptions CPUFLAGS="-march=armv4 -mtune=arm9"
Or perhaps that should be in files.ep93xx. It would then benefit the
whole kernel.
This causes the code and gcc output to be as attached, IE using umull,
and some shifts.
One thing I do note is that the comment about reading TIMER4VAL early to
allow it to count the overhead, and the actual read point in the asm
aren't matched up. I guess the compiler is optimizing the execution
path, some kind of barrier is probably needed to make the TIMER4VAL
stick where it was called.
Thanks,
Chris
void
delay(unsigned int n)
{
unsigned int cur_tick, initial_tick;
int remaining;
#ifdef DEBUG
if (epclk_sc == NULL) {
printf("delay: called before start epclk\n");
return;
}
#endif
/*
* Read the counter first, so that the rest of the setup overhead is
* counted.
*/
initial_tick = TIMER4VAL();
remaining = n * TIMER_FREQ / 1000000;
while (remaining > 0) {
cur_tick = TIMER4VAL();
if (cur_tick >= initial_tick)
remaining -= cur_tick - initial_tick;
else
remaining -= UINT_MAX - initial_tick + cur_tick + 1;
initial_tick = cur_tick;
}
}
delay:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
mov r3, r0, asl #20
ldr r2, .L14
sub r3, r3, r0, asl #16
umull r0, r1, r2, r3
str lr, [sp, #-4]!
mov r1, r1, lsr #18
ldr r3, .L14+4
cmp r1, #0
ldr r0, [r3, #96]
ldrle pc, [sp], #4
mov lr, r3
.L7:
ldr r2, [lr, #96]
mvn r3, r0
add r3, r2, r3
cmp r2, r0
rsb ip, r0, r2
rsb r3, r3, r1
subcc r1, r3, #1
rsbcs r1, ip, r1
cmp r1, #0
mov r0, r2
bgt .L7
ldr pc, [sp], #4
.L15:
.align 2
.L14:
.word 1125899907
.word -267321344
.size delay, .-delay
.align 2
Home |
Main Index |
Thread Index |
Old Index