Subject: Re: lib/35535: memcpy() is very slow if not aligned
To: None <port-amd64-maintainer@netbsd.org, gnats-admin@netbsd.org,>
From: David Laight <david@l8s.co.uk>
List: netbsd-bugs
Date: 02/03/2007 21:25:01
The following reply was made to PR port-amd64/35535; it has been noted by GNATS.
From: David Laight <david@l8s.co.uk>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: lib/35535: memcpy() is very slow if not aligned
Date: Sat, 3 Feb 2007 21:23:31 +0000
On Sat, Feb 03, 2007 at 02:25:02PM +0000, Kimura Fuyuki wrote:
>
> The real (what's real?) latency for rep instructions can be seen here (8.3):
> http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF
Hmmm... I'm not entirely certain some of the suggestions in that document are correct!
Some of the C code certainly isn't!
Page 17 suggests the use of:
#define FLOAT2INTCAST(f) (*((int *)(&f)))
for speeding up float comparisons agains constants.
Someone hasn't read up on the C aliasing rules.
Page 106 also suggests you need to be a lot more careful with your write-combining
code. Thinking further it probably can't be used without disabling interrupts (or
maybe making the write to each cache line a RAS sequence).
(But maybe I'm misunderstanding exactly what happens to the partially written line.)
eg stuff in appendix B :-)
Page 167 suggests never (ok hardly ever) using the rep string opcodes.
The algorithm on pages 181+ looks like a good way to kill the I-cache.
Oh, and for good measure, code has to run on intel cpus as well.
David
--
David Laight: david@l8s.co.uk