Subject: Re: Performance of various memcpy()'s
To: David Laight <david@l8s.co.uk>
From: Bang Jun-Young <junyoung@mogua.com>
List: tech-userlevel
Date: 10/28/2002 18:46:23
[Oops, I'm sorry - previous mail was sent in euc-kr charset, due to a bug
in Mutt.]
On Mon, Oct 28, 2002 at 09:01:11AM +0000, David Laight wrote:
> Given the significant performance improvent, I'd go for:
>
> > +ENTRY(memcpy)
> > + pushl %esi
> > + pushl %edi
> > +
> > + movl 12(%esp),%edi
> > + movl 16(%esp),%esi
> > + movl 20(%esp),%ecx
> > + movl %edi,%eax /* return value */
> > +
> > + movl %ecx,%edx
> > + cld /* nope, copy forwards. */
> > + shrl $2,%ecx /* copy by words */
> > + rep
> > + movsl
>
> andl $3,%edx
> jne 1f
> popl %edi
> popl %esi
> ret
> 1:
> > + movl %edx,%ecx
> > + rep
> > + movsb
> > + popl %edi
> > + popl %esi
> > + ret
That is included in my new i686_copy{in,out}() in (hopefully) a cleaner
and shorter way. I'm still investigating how much it gives.
>
> Or even finish off with:
> movb (%esi),%cl
> decl %edx
> movb %cl,(%di)
> jne 1b
> popl %edi
> popl %esi
> ret
>
> David
Mixing word size and byte size registers is generally not a good idea.
Intel manual says that it slows down performance, and I confirmed that
via memcpy tests.
Jun-Young
--
Bang Jun-Young <junyoung@mogua.com>