Subject: Re: New in_cksum/in4_cksum implementation
To: None <Richard.Earnshaw@arm.com>
From: David Laight <david@l8s.co.uk>
List: port-arm
Date: 09/11/2003 12:00:53
As an alternative, why not try to replace:
> .Lcksumdata_bigloop:
> ldmia r0!, {r4, r5, r6, r7}
> adds r2, r2, r4
> adcs r2, r2, r5
> adcs r2, r2, r6
> adcs r2, r2, r7
> ldmia r0!, {r4, r5, r6, r7}
> adcs r2, r2, r4
> adcs r2, r2, r5
> adcs r2, r2, r6
> adcs r2, r2, r7
> ldmia r0!, {r4, r5, r6, r7}
> adcs r2, r2, r4
> adcs r2, r2, r5
> adcs r2, r2, r6
> adcs r2, r2, r7
> ldmia r0!, {r4, r5, r6, r7}
> adcs r2, r2, r4
> adcs r2, r2, r5
> adcs r2, r2, r6
> adcs r2, r2, r7
> adc r2, r2, #0x00
> subs r1, r1, #0x40
> bge .Lcksumdata_bigloop
with:
ldmia r0!, {r4, r5, r6, r7}
.Lcksumdata_bigloop:
adds r2, r2, r4
adcs r2, r2, r5
adcs r2, r2, r6
adcs r2, r2, r7
ldmia r0!, {r4, r5, r6, r7}
adc r2, r2, #0x00
subs r1, r1, #0x10
bge .Lcksumdata_bigloop
adds r2, r2, r4
adcs r2, r2, r5
adcs r2, r2, r6
adcs r2, r2, r7
adc r2, r2, #0x00
Which may not even need unrolling.
On an SA1100 I found that an extra "ldr rx,[r0],#n" after the ldmia
helped memcpy (fetches the cache line, but never stalls on the result).
David
--
David Laight: david@l8s.co.uk