So my latest xen_clock.c revision is now revealing what may be a real clue to identifying the mysterious timekeeping regression, including what exactly happens after the approximately 7.5 days of uptime. I was going cross-eyed and zombie brained staring at 15-digit numbers of nanoseconds with odd 533 second intervals between the log lines from my debug printf in xen_rtc_set() so I changed the interval to 10 seconds live with sysctl in a dom0 that was experiencing the problem and something immediately jumped out plain as day to me. Sometime (apparently randomly) between 650k and 660k seconds (~7.5 days) after boot the result from nanouptime() seems to "drag" very significantly behind Xen "system time", almost stopping for tens of seconds, as if it's suddenly calculating ns from the timecounter with a radically different frequency (i.e. a different th_scale), or maybe even not properly retrieving the timecounter value (i.e. with xen_get_timecount(). Sometimes it keeps time again normally for a brief period, but then drags to nearly a stop again. Thus things appear to go wonky very quickly and nothing can be done to recover but reboot. Also somehow despite nanouptime() dragging, the callout still keeps firing almost exactly every xen_timepush.ticks! (as confirmed by the conserver log timestamps) What's weird is that both Xen system time and nanouptime() (using the "xen_system_time" timecounter) are, in a dom0, both based on "raw" RDTSC values and are both using Xen's calibrations of the TSC frequency. How can they produce such stunningly different results? Maybe the bug is somewhere in timecounter(9)? It doesnt' seem like it can be in xen_clock.c. I've also realised, after more thorough examination of the Xen kernel code, that it doesn't matter very much whether one uses nanouptime() or xen_global_systime_ns() for the "system_time" value passed to XENPF_settime as it's only used by Xen to work backwards again to find the system boot time (i.e. Xen's "wall clock time"). This has no possible bearing whatsoever on the timekeeping regression bug either. Xen could have used its own get_s_time(), or the NOW() wrapper macro around that, but there would be a slight execution delay in doing so it is slightly more accurate to get it in the dom0 before the hyperop than to do it in Xen itself (at least that's the only explanation I can think of to have an API that requires the caller to hand back to Xen a value that it already has equal or better access to itself). I'm no further to finding the root cause yet, though I think I can now firmly state that the problem is not in the Xen kernel (even if it was a more recent change in the Xen kernel that has exposed the bug in NetBSD). I'll post my current copy of xen_clock.c as a reply. See the debug printf in xen_rtc_set(). I can post a sample set of log lines too if anyone would like to see them. I'll hopefully have more evidence in new log lines within the coming week. One new thing I did was to try to identify the magnitude of difference between TSC registers from different cores by adding a new event counter that is only incremented if the current TSC-as-ns value is one or more ticks behind the last retrieved "global_ns" value, and I find that on the most-used cores, in dom0 only, this event happens very very, very, frequently, sometimes more than 1.5*HZ, and usually at least at HZ rate. If I'm interpreting this correctly how the heck can a TSC register be a whole tick or more behind another so often, and why is this only seen at high rates in dom0? No doubt there can be some skew between TSC registers, especially between cores in different sockets, but right from boot I see this happen immediately. My xen_clock.c is accounting for these differences in exactly the same way as FreeBSD does, so far as I can tell, and yet FreeBSD does not experience the same overall problem. -- Greg A. Woods <gwoods%acm.org@localhost> Kelowna, BC +1 250 762-7675 RoboHack <woods%robohack.ca@localhost> Planix, Inc. <woods%planix.com@localhost> Avoncote Farms <woods%avoncote.ca@localhost>
Attachment:
pgpeNzZrK_nwN.pgp
Description: OpenPGP Digital Signature