Subject: Re: Lock benchmarks
To: None <tech-kern@netbsd.org>
From: Jason R Thorpe <thorpej@wasabisystems.com>
List: tech-kern
Date: 09/19/2002 12:24:01
On Tue, Sep 17, 2002 at 10:03:00AM +1200, Gregory McGarry wrote:
> I'd appreciate receiving the timings on different CPUs. I guess
> the results are somewhat academic since there is marginally any
> difference between the locking schemes. But numbers are cool.
...for a couple of ARM architecture platforms...
StrongARM SA-110, 233MHz with DC21285 system controller:
Registering restartable atomic sequences
Timing unlock overhead
Timing RAS locks
Timing CPU locks (inlined)
Timing CPU locks (not inlined)
unlock overhead: 1.603476 s (0.016035 us/loop)
RAS: 4.402919 s (0.044029 us/loop)
cpu locks (inlined): 44.463102 s (0.444631 us/loop)
cpu locks (not inlined): 50.253711 s (0.502537 us/loop)
Intel i80321 (XScale core), 400MHz
Registering restartable atomic sequences
Timing unlock overhead
Timing RAS locks
Timing CPU locks (inlined)
Timing CPU locks (not inlined)
unlock overhead: 0.751238 s (0.007512 us/loop)
RAS: 3.756992 s (0.037570 us/loop)
cpu locks (inlined): 2.254331 s (0.022543 us/loop)
cpu locks (not inlined): 7.020705 s (0.070207 us/loop)
...for the XScale case, RAS is paying the penalty of the branch (3 cycles).
On this particular XScale system, the memory controller is built-in to the
CPU, the memory the swp insn is manipulating is cacheable, and so the swp
insn is cheap. It may well be different on other XScale-based platforms
(e.g. an i80200 + i80312 platform -- I just can't test that easily right now).
I still think RAS is clearly a win, here, because of the obvious benefit on
non-XScale platforms.
--
-- Jason R. Thorpe <thorpej@wasabisystems.com>