Subject: Re: bcopy optimisation
To: None <port-arm32@NetBSD.ORG>
From: Olly Betts <olly@MANTIS.CO.UK>
List: port-arm32
Date: 07/04/1996 14:28:36
"Mark Brinicombe" writes:
>[Fast bcopy required]
>
>In addition to making it fast typically using the LDM and STM instructions
>consideration needs to be given to the sizes being copied. Logging statistics
>for the bcopy routine shows that it is regularly called for certain sizes
>of copy far more frequently than others.
>The most common sizes are 12, 8, 128, 6, 4, 16, 2 in that order.
>This may mean that the best performance will be gained if these sizes are
>spotted and specially coded.
If most of these copies are done with a constant length, i.e.:
memcpy( a, b, 12 );
Rather than:
memcpy( a, b, len );
where len is usually 12, then it might be better to get the compiler to
spot them and call a tailored routine _memcpy12( a, b ) which was capable
of being in-lined. Here's a version of _memcpy2 to clarify the idea:
_memcpy2
LDRB R2,[R1]
LDRB R3,[R1,#1]
STRB R2,[R0]
STRB R3,[R0,#1]
MOV PC,R14
Anyone know how easy is it to get GCC to do this sort of thing?
>If you want to go further the alignment of the src and destination addresses
>needs to be looked at, again to help design the best bcopy for the job.
>
>Any takers for the job or do I have to add this to my todo list ?
I'm happy to write this sort of stuff, though I may not get done very
quickly. I've already written a very fast strlen() if that's any use.
Olly