Subject: lock performance on i386
To: None <tech-smp@netbsd.org>
From: David Laight <david@l8s.co.uk>
List: tech-smp
Date: 09/19/2002 19:30:05
The spin lock code is currently:

static __inline void 
__cpu_simple_lock(__cpu_simple_lock_t *alp)
{
	int __val = __SIMPLELOCK_LOCKED;
 
	do {
		__asm __volatile("xchgl %0, %2"
			: "=r" (__val)
			: "0" (__val), "m" (*alp));
	} while (__val != __SIMPLELOCK_UNLOCKED);
}

Which means that if the lock is contended then the cpu is
doing continuous, expensive, locked operations on the bus.
Which, IIRC, force a cache snoop? cycle on all the cpus.

It would be much more efficient to use the following:
(in asm, but not __asm...)

ENTRY(__cpu_simple_lock)
	movl	4(%esp),%edx
	movl	$__SIMPLELOCK_LOCKED,%eax
1:	xchgl	(%edx),%eax
	testl	%eax,%eax
	jne	2f
	ret
2:	pause			# for P4
	cmpl	(%edx),%eax	# just read until lock available
	jne	1b
	jmp	2b

(inlined you might want the 'ret' at the end...)
Also under debug, maybe take a function call for the lower
loop - that could do some checks for lock contention
if it spins for too long.

	David

-- 
David Laight: david@l8s.co.uk