Subject: lock performance on i386
To: None <tech-smp@netbsd.org>
From: David Laight <david@l8s.co.uk>
List: tech-smp
Date: 09/19/2002 19:30:05
The spin lock code is currently:
static __inline void
__cpu_simple_lock(__cpu_simple_lock_t *alp)
{
int __val = __SIMPLELOCK_LOCKED;
do {
__asm __volatile("xchgl %0, %2"
: "=r" (__val)
: "0" (__val), "m" (*alp));
} while (__val != __SIMPLELOCK_UNLOCKED);
}
Which means that if the lock is contended then the cpu is
doing continuous, expensive, locked operations on the bus.
Which, IIRC, force a cache snoop? cycle on all the cpus.
It would be much more efficient to use the following:
(in asm, but not __asm...)
ENTRY(__cpu_simple_lock)
movl 4(%esp),%edx
movl $__SIMPLELOCK_LOCKED,%eax
1: xchgl (%edx),%eax
testl %eax,%eax
jne 2f
ret
2: pause # for P4
cmpl (%edx),%eax # just read until lock available
jne 1b
jmp 2b
(inlined you might want the 'ret' at the end...)
Also under debug, maybe take a function call for the lower
loop - that could do some checks for lock contention
if it spins for too long.
David
--
David Laight: david@l8s.co.uk