Subject: Re: m68k bcopy implementation performance results
To: None <port-m68k@NetBSD.ORG>
From: Ignatios Souvatzis <is@beverly.rhein.de>
List: port-m68k
Date: 12/18/1996 21:26:46
> Thanks to all who sent me results of the bcopy benchmark --- sorry I
> didn't mention that it might take days to execute.
Hehe, I scaled it down by a factor of 10/100 for some of the tests, at
least on the 68030 machine... I don't have days of uptime. All my
machines are in my bedroom, and I'm not deaf yet, neither want to be :-)
> I was a bit surprised by the results. The two optimized versions were
> between 65 and 105% faster on the '020, '040, and '060. But there was
> barely any improvement on '030 systems. I don't know why --- the
> benchmarks should be large enough to eliminate (or at least make
> visable) any data cache effects.
Well, my results clearly show some 15% (if I recall right, but I sent
the details to jtc) on the 68030 ... maybe my machine has other memory
access times (5-2-2-2 burst access) than the other '030 testers' one.
Additionally, I did some wierd tests manipulating the A3000's RAM controller.
If I turn off burst accesses, performance for long copies increases by
another 10 or 15%. I suspect this is caused by write allocation. So my
suggestion is to switch write allocation off before copypage/zeropage, and
switch it back on afterwards, on the 030 (same code as the 020, which ignores
this bit in the CACR). I'll try to benchmark this "soon".
They further show _NO_ improvement on the 68060 for duffs device, if the
branch prediction is on (but a factor of two improvement to the old
bcopy, branch prediction off case), and a few percent slowdown of the
simple unrolled loop to the (old code/duffsdev) case if branch
prediction is on. (Yes, it is on now for the 68060 --- I finally found
time to run a few tests and be sure I found that caching bug which hit
me half a year ago).
Just thought people might like to hear about this.
Regards,
Ignatios Souvatzis
Ignatios Souvatzis