Subject: Re: New in_cksum/in4_cksum implementation
To: Steve Woodford <scw@wasabisystems.com>
From: Richard Earnshaw <rearnsha@arm.com>
List: port-arm
Date: 09/11/2003 09:59:28
It almost certainly wouldn't hurt anywhere, and on ARM10 it would be a win
if this code:
.Lcksumdata_bigloop:
ldmia r0!, {r4, r5, r6, r7}
adds r2, r2, r4
adcs r2, r2, r5
adcs r2, r2, r6
adcs r2, r2, r7
ldmia r0!, {r4, r5, r6, r7}
adcs r2, r2, r4
adcs r2, r2, r5
adcs r2, r2, r6
adcs r2, r2, r7
ldmia r0!, {r4, r5, r6, r7}
adcs r2, r2, r4
adcs r2, r2, r5
adcs r2, r2, r6
adcs r2, r2, r7
ldmia r0!, {r4, r5, r6, r7}
adcs r2, r2, r4
adcs r2, r2, r5
adcs r2, r2, r6
adcs r2, r2, r7
adc r2, r2, #0x00
subs r1, r1, #0x40
bge .Lcksumdata_bigloop
Could be rewritten as
.Lcksumdata_bigloop:
ldmia r0!, {r4, r5, r6, r7}
adds r2, r2, r4
adcs r2, r2, r5
adcs r2, r2, r6
ldmia r0!, {r4, r5, r6, r8}
adcs r2, r2, r7
adcs r2, r2, r4
adcs r2, r2, r5
adcs r2, r2, r6
ldmia r0!, {r4, r5, r6, r7}
adcs r2, r2, r8
adcs r2, r2, r4
adcs r2, r2, r5
adcs r2, r2, r6
ldmia r0!, {r4, r5, r6, r8}
adcs r2, r2, r7
adcs r2, r2, r4
adcs r2, r2, r5
adcs r2, r2, r6
adcs r2, r2, r8
adc r2, r2, #0x00
subs r1, r1, #0x40
bge .Lcksumdata_bigloop
Which uses one more register (r8 -- is it live at that time?), but gives
an extra slot under which the LDM instructions can operate. I'm not sure
if anything can be done about the first ldm in the sequence. Moving that
ldm away from the first instruction in the body would require making it
execute conditionally. That's potentially a lose on some processors.
R.