Subject: Re: new mips cache performance
To: Simon Burge <simonb@wasabisystems.com>
From: Jason R Thorpe <thorpej@wasabisystems.com>
List: port-mips
Date: 11/18/2001 10:06:41
On Mon, Nov 19, 2001 at 01:00:36AM +1100, Simon Burge wrote:
> I've tried a simple benchmark of building a pmax kernel three times in a
> row with kernels built both pre-merge and post-merge of the mips cache
> branch (using the "2001-11-14 18:00:00 UTC" and "2001-11-15 12:20:00
> UTC" date tags). The post-merge figures were slightly worse (about 30
> seconds slower over an hourish build). At Jason's suggestion, I removed
> the check for an L2 cache in the __mco_noargs and __mco_2args macros
> in <mips/cache.h> so that it always called the L2 cache ops and that
> shaved about 108 seconds off the average benchmark time making it about
> 75 seconds quicker than a pre-merge kernel.
Hm, okay. I'm kind of annoyed that the test is that expensive :-/
So, couple of options, here..
(1) Make noop L2 cache routines for platforms which don't
have them, and always let the code jump into the L2
routine.
(2) Do the pseudo-vector thing. Since the individual cache
primitives are too large to stuff into here, we would
have to copy cache-op-call-sites into the pvecs. This would
mean stack allocation, saving some regs, etc. in the pvecs.
I'm leaning towards (1), since, as a macro, the compiler would have
a better time of optimizing the code around the call sites.
> The tests were run on a DECsystem 5000/260 (R4400 at 60MHz, 16k L1
> Icache, 16k L1 Dcache and 1MB L2 cache, 192MB RAM) with source and
> kernel compile directory on local disk, using a GENERIC kernel that
> has both the MIPS1 and MIPS3 options.
BTW, what's the line size of our L2 cache? 128 bytes? We could probably
squeeze some more out by writing 128-byte optimized L2 cache ops (which
unroll the loop somewhat).
--
-- Jason R. Thorpe <thorpej@wasabisystems.com>