Subject: Re: query on timing of some asm
To: Chris Gilbert <chris@paradox.demon.co.uk>
From: Richard Earnshaw <rearnsha@buzzard.freeserve.co.uk>
List: port-arm
Date: 04/10/2001 21:13:47
> Hi,
>
> I'm just wanting to clarify something to do with load delays, and conditional
> execution, as I understand it and ldr instruction requires 2 cycles for the
> value to be available or it stalls, also cond exec still uses up 1 cycle even
> if the instruction isn't actually executed, so in theory:
>
> (code is from iomd_irq.S)
> This:
> ldr r6, [r7, r9, lsl #2] /* Get address of first handler structure */
> ldr r4, Lcnt /* Stat info A */
>
> teq r6, #0x00000000 /* Do we have a handler */
> moveq r0, r8 /* IRQ requests as arg 0 */
>
> will stall one cycle waiting for r6 to fill?
On Xscale it will stall for one cycle. On all other ARMs to date it will
execute without stalling (provided that both ldr instructions hit cache
entries -- ARM10 has hit under miss, but it is the only one that does so
far).
>
> so does this mean that:
> ldr r6, [r7, r9, lsl #2] /* Get address of first handler structure */
> ldr r4, Lcnt /* Stat info A */
>
> mov r0, r8 /* IRQ requests as arg 0 */
> teq r6, #0x00000000 /* Do we have a handler */
>
> where the value of r0 only matters if r6 == NULL, it's overwritten elsewhere
> if r6 != NULL.
>
> would actually save 1 cycle? Or does this turn out to depend on the
> processor?
It depends on the processor. You above chanage will only improve things
for XScale.
Tweaking assembly files is OK, provided it doesn't obscure meaning too
much. Generic assembly files like these are a compromise between
performance and clarity for all supported systems. It's not a good idea,
for example, to have to save more registers than necessary just to avoid
stalls (unless in a very tight loop). Remember that older ARMs (ie up to
and including ARM 7) see no benefit from these re-arrangments -- and
scheduling to avoid stalling the ARM7 write buffer is a completely
different art.
R.