Subject: Re: kern/32035: APIC timer help
To: Simon Burge <simonb@wasabisystems.com>
From: Frederick Bruckman <fredb@immanent.net>
List: tech-kern
Date: 12/19/2005 11:38:38
In article <20051207070244.E209723402@thoreau.thistledown.com.au>,
	Simon Burge <simonb@wasabisystems.com> writes:
> Simon Burge wrote:
> 
>> [ local APIC timer problem discussed ]
> 
> I've come to the conclusion that for some reason on the problematic
> machines the APIC timer just doesn't fire with the same period for some
> unknown reason, and that there's nothing we can really do about.  The
> patch at
> 
>    ftp://ftp.netbsd.org/pub/NetBSD/misc/simonb/mp-time-hack.diff
> 
> at least lets time run stably.  The main comment at the top of the patch
> describes what it does:
> 
> 	* Some MP systems have been observed to not have a
> 	* stable local APIC timer interrupt.  We count the
> 	* number of TSC cycles since the last call to
> 	* lapic_clockintr(), and if it has been longer than
> 	* expected we add in some extract time for hardclock()
> 	* to add in when it computes the next value of the
> 	* system "time" variable.  Note that we don't skip
> 	* time backwards - early arrivals to lapic_clockintr()
> 	* have only been observed sporadically, and we'll
> 	* soon catch up.
> 
> Longer term, switching to timecounters is a more correct fix since they
> base time calculations on the TSC counter and not the period of the
> clock interrupt.  Using HPET timers where available will also help.

That sounds really interesting. The problem I see with your theory,
is that it's the same APIC timer for the one CPU or two CPU cases.
I suspect some latency in the IPI/read-TSC code path.  Maybe the
"rdtsc" instruction simply isn't in the icache on the slow cycles?
Experimenting as you suggest would help answer the question.
 
> I'd be curious if anyone else with SMP boxes that have time keeping
> problems could test this out and see if it fixes the time problem.

It helps! The frequency (as logged in "/var/log/loopstats") jumps to
a few hundred under heavy disk I/O, but then settles back down without
stepping. (Patch applied to netbsd-3-0). Yet, on the same machine with
a non-SMP kernel (2.1 to 3.0_RC6), the frequency slowly varies from
about 5.0 to 11.0, depending on ambient temperature, so it's clearly
not a complete fix.


Frederick