Subject: Re: Xscale optimisations
To: David Laight <david@l8s.co.uk>
From: Richard Earnshaw <rearnsha@arm.com>
List: port-arm
Date: 10/14/2003 18:35:54
> > > Mmmm IIRC we only ever saw bursts of the memory bus for cache line writes.
> > > (Although it wsn't me driving the analiser that day.)
> >
> > Hmm, yes, I suspect I was mistaken on that. The SA110 timing apps note
> > does seem to confirm your observations.
>
> yes - we were expecting to see burst writes, but didn't....
>
> > > I know I got faster memcpy (on sa1100) by fetching the target buffer
> > > into the data cache (an lda offset by a magic number would do the trick,
> > > didn't stall since the target data was never used!)
> >
> > Which would be faster would probably depend on the relative
> > sequential/non-sequential times and the number of words to be written to a
> > line. Plus some compensation for the fact that other useful data will
> > likely be cast out of the cache. It is believable that 2(N+7S) < 8N (ie
> > 2.33 S < N) for many memory systems and thus that fetching a line into
> > cache would most likely be more efficient than writing to memory that was
> > out of the cache.
>
> N = first, S = subsequent
Sorry, yes (N=Non-sequential, S=Sequential). It's terminology from old
ARM data sheets which talked about N, S and I (Internal) cycles.
A sequential cycle must follow either an N cycle or an S cycle and must be
at an ascending address (in this case wrap-around on the same CAS address
would be OK).
So a cache line fill (or drain) would look like
N-S-S-S-S-S-S-S
and individual stores would be
N-N-N...
R.