Subject: Re: copyin/out
To: Chris Gilbert <chris@paradox.demon.co.uk>
From: Jason R Thorpe <thorpej@wasabisystems.com>
List: port-arm
Date: 08/09/2002 09:16:27
On Fri, Aug 09, 2002 at 10:16:52AM +0100, Chris Gilbert wrote:
> Quick look over it, do you need to preload the addresses you're storing to?
> or does that cause it to fetch the tlb entries for speed? IE aren't you
> just filling the cache with stuff you're about to overwrite?
On some processors, in certain modes, the cache does not allocate a line
on a write-miss, and you essentially get write-through semantics. Prefetching
the destination into the cache means you get write-back semantics always,
and lets the cache clean the line to put that data in before you actually
*need* it.
> Hmm, I see near enough that already on cats 1.6D.
> 1073741824 bytes transferred in 17.343 secs (61912115 bytes/sec)
Interesting. The performance characteristics of the old code were
VERY different on a 400MHz i80321 (XScale core). Indeed the old code
on my Shark can do:
1073741824 bytes transferred in 15.120 secs (71014670 bytes/sec)
and the new code on the Shark yields:
1073741824 bytes transferred in 8.447 secs (127115167 bytes/sec)
That is a SIGNIFICANT improvement.
--
-- Jason R. Thorpe <thorpej@wasabisystems.com>