Subject: speeding up bzero
To: None <port-i386@netbsd.org>
From: David Laight <david@l8s.co.uk>
List: port-i386
Date: 04/11/2003 21:43:00
The patch below speeds up bzero on both P2 and old Athlons.
I suspect that it is a gain on P3, it is almost certainly
a win on P4.
My Athlon 700 gains about 1.5% on 8k aligned calls.
For 20 byte aligned transfers the gain is 38%
For 20 byte misaligned transfers the gain is 29%
The changes are twofold:
1) avoid jumps in the aligned path
2) avoid stos for small counts
16 bytes is somewhere near the breakeven point for byte transfers
(it depends on the alignment and trailing bytes.)
I haven't done any tests to find out at what point 'rep stosb'
wins over the code loop.
David
Index: bzero.S
===================================================================
RCS file: /cvsroot/src/sys/lib/libkern/arch/i386/bzero.S,v
retrieving revision 1.6
diff -u -p -r1.6 bzero.S
--- bzero.S 1998/02/22 08:14:57 1.6
+++ bzero.S 2003/04/11 20:34:21
@@ -12,34 +12,48 @@
ENTRY(bzero)
pushl %edi
movl 8(%esp),%edi
- movl 12(%esp),%edx
+ movl 12(%esp),%ecx
cld /* set fill direction forward */
xorl %eax,%eax /* set fill data to 0 */
/*
* if the string is too short, it's really not worth the overhead
* of aligning to word boundries, etc. So we jump to a plain
* unaligned set.
*/
- cmpl $16,%edx
+ cmpl $16,%ecx
jb L1
- movl %edi,%ecx /* compute misalignment */
- negl %ecx
- andl $3,%ecx
- subl %ecx,%edx
- rep /* zero until word aligned */
- stosb
-
- movl %edx,%ecx /* zero by words */
+ movl %edi,%edx /* detect misalignment */
+ andl $3,%edx
+ jnz align
+aligned:
+ movl %ecx,%edx /* zero by words */
shrl $2,%ecx
andl $3,%edx
rep
stosl
+ jnz do_remainder
+ pop %edi
+ ret
-L1: movl %edx,%ecx /* zero remainder by bytes */
- rep
+align:
+ xorb $3,%dl /* get misaligned count */
+ subl %edx,%ecx /* remove frommain count */
+do_remainder:
+1: movb %al,(%edi) /* coping byte by byte is... */
+ inc %edi /* ...faster than rep stosb */
+ dec %edx
+ jnz 1b
+ test %ecx,%ecx /* zero if doing remainder */
+ jnz aligned
+ pop %edi
+ ret
+
+L1: rep
stosb
popl %edi
ret
--
David Laight: david@l8s.co.uk