Subject: Re: New in_cksum/in4_cksum implementation
To: Steve Woodford <scw@wasabisystems.com>
From: Richard Earnshaw <rearnsha@arm.com>
List: port-arm
Date: 09/11/2003 09:59:28
It almost certainly wouldn't hurt anywhere, and on ARM10 it would be a win 
if this code:

.Lcksumdata_bigloop:
	ldmia	r0!, {r4, r5, r6, r7}
	adds	r2, r2, r4
	adcs	r2, r2, r5
	adcs	r2, r2, r6
	adcs	r2, r2, r7
	ldmia	r0!, {r4, r5, r6, r7}
	adcs	r2, r2, r4
	adcs	r2, r2, r5
	adcs	r2, r2, r6
	adcs	r2, r2, r7
	ldmia	r0!, {r4, r5, r6, r7}
	adcs	r2, r2, r4
	adcs	r2, r2, r5
	adcs	r2, r2, r6
	adcs	r2, r2, r7
	ldmia	r0!, {r4, r5, r6, r7}
	adcs	r2, r2, r4
	adcs	r2, r2, r5
	adcs	r2, r2, r6
	adcs	r2, r2, r7
	adc	r2, r2, #0x00
	subs	r1, r1, #0x40
	bge	.Lcksumdata_bigloop

Could be rewritten as

.Lcksumdata_bigloop:
	ldmia	r0!, {r4, r5, r6, r7}
	adds	r2, r2, r4
	adcs	r2, r2, r5
	adcs	r2, r2, r6
	ldmia	r0!, {r4, r5, r6, r8}
	adcs	r2, r2, r7
	adcs	r2, r2, r4
	adcs	r2, r2, r5
	adcs	r2, r2, r6
	ldmia	r0!, {r4, r5, r6, r7}
	adcs	r2, r2, r8
	adcs	r2, r2, r4
	adcs	r2, r2, r5
	adcs	r2, r2, r6
	ldmia	r0!, {r4, r5, r6, r8}
	adcs	r2, r2, r7
	adcs	r2, r2, r4
	adcs	r2, r2, r5
	adcs	r2, r2, r6
	adcs	r2, r2, r8
	adc	r2, r2, #0x00
	subs	r1, r1, #0x40
	bge	.Lcksumdata_bigloop

Which uses one more register (r8 -- is it live at that time?), but gives 
an extra slot under which the LDM instructions can operate.  I'm not sure 
if anything can be done about the first ldm in the sequence.  Moving that 
ldm away from the first instruction in the body would require making it 
execute conditionally.  That's potentially a lose on some processors.

R.