a real clue about the ~7.5 days! (was: timekeeping regression?)

To: NetBSD/xen Discussion List <port-xen%NetBSD.org@localhost>
Subject: a real clue about the ~7.5 days! (was: timekeeping regression?)
From: "Greg A. Woods" <woods%planix.ca@localhost>
Date: Tue, 30 Jul 2024 13:37:10 -0700

So my latest xen_clock.c revision is now revealing what may be a real
clue to identifying the mysterious timekeeping regression, including
what exactly happens after the approximately 7.5 days of uptime.

I was going cross-eyed and zombie brained staring at 15-digit numbers of
nanoseconds with odd 533 second intervals between the log lines from my
debug printf in xen_rtc_set() so I changed the interval to 10 seconds
live with sysctl in a dom0 that was experiencing the problem and
something immediately jumped out plain as day to me.

Sometime (apparently randomly) between 650k and 660k seconds (~7.5 days)
after boot the result from nanouptime() seems to "drag" very
significantly behind Xen "system time", almost stopping for tens of
seconds, as if it's suddenly calculating ns from the timecounter with a
radically different frequency (i.e. a different th_scale), or maybe even
not properly retrieving the timecounter value (i.e. with
xen_get_timecount(). Sometimes it keeps time again normally for a brief
period, but then drags to nearly a stop again. Thus things appear to go
wonky very quickly and nothing can be done to recover but reboot.

Also somehow despite nanouptime() dragging, the callout still keeps
firing almost exactly every xen_timepush.ticks! (as confirmed by the
conserver log timestamps)

What's weird is that both Xen system time and nanouptime() (using the
"xen_system_time" timecounter) are, in a dom0, both based on "raw" RDTSC
values and are both using Xen's calibrations of the TSC frequency. How
can they produce such stunningly different results? Maybe the bug is
somewhere in timecounter(9)? It doesnt' seem like it can be in
xen_clock.c.

I've also realised, after more thorough examination of the Xen kernel
code, that it doesn't matter very much whether one uses nanouptime() or
xen_global_systime_ns() for the "system_time" value passed to
XENPF_settime as it's only used by Xen to work backwards again to find
the system boot time (i.e. Xen's "wall clock time"). This has no
possible bearing whatsoever on the timekeeping regression bug either.
Xen could have used its own get_s_time(), or the NOW() wrapper macro
around that, but there would be a slight execution delay in doing so it
is slightly more accurate to get it in the dom0 before the hyperop than
to do it in Xen itself (at least that's the only explanation I can think
of to have an API that requires the caller to hand back to Xen a value
that it already has equal or better access to itself).

I'm no further to finding the root cause yet, though I think I can now
firmly state that the problem is not in the Xen kernel (even if it was a
more recent change in the Xen kernel that has exposed the bug in NetBSD).

I'll post my current copy of xen_clock.c as a reply. See the debug
printf in xen_rtc_set(). I can post a sample set of log lines too if
anyone would like to see them. I'll hopefully have more evidence in new
log lines within the coming week.

One new thing I did was to try to identify the magnitude of difference
between TSC registers from different cores by adding a new event counter
that is only incremented if the current TSC-as-ns value is one or more
ticks behind the last retrieved "global_ns" value, and I find that on
the most-used cores, in dom0 only, this event happens very very, very,
frequently, sometimes more than 1.5*HZ, and usually at least at HZ rate.

If I'm interpreting this correctly how the heck can a TSC register be a
whole tick or more behind another so often, and why is this only seen at
high rates in dom0? No doubt there can be some skew between TSC
registers, especially between cores in different sockets, but right from
boot I see this happen immediately. My xen_clock.c is accounting for
these differences in exactly the same way as FreeBSD does, so far as I
can tell, and yet FreeBSD does not experience the same overall problem.

--
Greg A. Woods <gwoods%acm.org@localhost>

Kelowna, BC +1 250 762-7675 RoboHack <woods%robohack.ca@localhost>
Planix, Inc. <woods%planix.com@localhost> Avoncote Farms <woods%avoncote.ca@localhost>

Attachment: pgpeNzZrK_nwN.pgp
Description: OpenPGP Digital Signature

Follow-Ups:
- Re: a real clue about the ~7.5 days! (was: timekeeping regression?)
  - From: Greg A. Woods

References:
- Re: timekeeping regression?
  - From: Greg A. Woods
- Re: timekeeping regression?
  - From: Brad Spencer
- Re: timekeeping regression?
  - From: Greg A. Woods
- Re: timekeeping regression?
  - From: Greg A. Woods
- Re: timekeeping regression?
  - From: Greg A. Woods
- Re: timekeeping regression?
  - From: Brad Spencer
- Re: timekeeping regression?
  - From: Greg A. Woods
- Re: timekeeping regression?
  - From: Greg A. Woods
- Re: timekeeping regression?
  - From: Greg A. Woods
- Re: timekeeping regression? (a possible clue about the ~7.5 days)
  - From: Greg A. Woods

Prev by Date: Re: annual howto rampage
Next by Date: Re: a real clue about the ~7.5 days! (was: timekeeping regression?)
Previous by Thread: Re: timekeeping regression? (a possible clue about the ~7.5 days)
Next by Thread: Re: a real clue about the ~7.5 days! (was: timekeeping regression?)
Indexes:

Home | Main Index | Thread Index | Old Index