Subject: Re: New in_cksum/in4_cksum implementation
To: None <Richard.Earnshaw@arm.com>
From: David Laight <david@l8s.co.uk>
List: port-arm
Date: 09/11/2003 12:00:53
As an alternative, why not try to replace:

> .Lcksumdata_bigloop:
> 	ldmia	r0!, {r4, r5, r6, r7}
> 	adds	r2, r2, r4
> 	adcs	r2, r2, r5
> 	adcs	r2, r2, r6
> 	adcs	r2, r2, r7
> 	ldmia	r0!, {r4, r5, r6, r7}
> 	adcs	r2, r2, r4
> 	adcs	r2, r2, r5
> 	adcs	r2, r2, r6
> 	adcs	r2, r2, r7
> 	ldmia	r0!, {r4, r5, r6, r7}
> 	adcs	r2, r2, r4
> 	adcs	r2, r2, r5
> 	adcs	r2, r2, r6
> 	adcs	r2, r2, r7
> 	ldmia	r0!, {r4, r5, r6, r7}
> 	adcs	r2, r2, r4
> 	adcs	r2, r2, r5
> 	adcs	r2, r2, r6
> 	adcs	r2, r2, r7
> 	adc	r2, r2, #0x00
> 	subs	r1, r1, #0x40
> 	bge	.Lcksumdata_bigloop

with:

	ldmia	r0!, {r4, r5, r6, r7}
.Lcksumdata_bigloop:
	adds	r2, r2, r4
	adcs	r2, r2, r5
	adcs	r2, r2, r6
	adcs	r2, r2, r7
	ldmia	r0!, {r4, r5, r6, r7}
	adc	r2, r2, #0x00
	subs	r1, r1, #0x10
	bge	.Lcksumdata_bigloop
	adds	r2, r2, r4
	adcs	r2, r2, r5
	adcs	r2, r2, r6
	adcs	r2, r2, r7
	adc	r2, r2, #0x00

Which may not even need unrolling.

On an SA1100 I found that an extra "ldr rx,[r0],#n" after the ldmia
helped memcpy (fetches the cache line, but never stalls on the result).



	David

-- 
David Laight: david@l8s.co.uk