Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: timekeeping regression

At Fri, 14 Feb 2025 13:58:38 -0800, "Greg A. Woods" <woods%planix.ca@localhost> wrote:
Subject: Re: timekeeping regression
> The problem begins somewhere between Xen-4.11 and Xen-4.18 (probably
> since 4.13 actually, though that'll be proven in about a week's time).
> I've reinstalled 4.11 on one of the machines and it has been running
> with a stable clock for nearly 15 days now, and I recall previously
> having it run for much longer without problem.

So, yes, 4.13 is indeed continuing to keep good time well after 7.5 days
of uptime.  I guess I should try 4.15.

BTW, Andrew Cooper mentioned to me on xen-devel:

	Time handling is a known swamp.  I can believe something has
	changed since 4.13, but I wouldn't say it was working back then

It still has/had a high rate of "skew" between TSC counters on different
vCPUs that shows up as hits in the/my new xen_clock.c's algorithm for
keeping global_ns (xen_global_systime_ns_stamp, used for the
xen_system_time timecounter) from going backwards:

# vmstat -e | fgrep xen | fgrep -v xenev0
vcpu0 xen missed hardclock                               538894    0 intr
vcpu0 xen global_ns prevented from running backwards   45972839   66 intr
vcpu1 xen missed hardclock                               989203    1 intr
vcpu1 xen global_ns prevented from running backwards   71521339  104 intr

One of the domUs (running with a XEN_CLOCK_DEBUG kernel) has reported a
slew of quite odd "hardclock jumped past timecounter max" events,
something I've never seen before.

Its system clock, with ntpd, is still keeping very good time though.

[ 461875.9597960] WARNING: hardclock jumped past timecounter max 545582329112350ns (545617865148933 -> 35536036583), exceeding maximum of 4294967295ns for timecounter(9)
[[ ... repeating with adjustments until finally ... ]]
[ 461897.6710341] WARNING: hardclock jumped past timecounter max 545634329112112ns (545639575735115 -> 5246623003), exceeding maximum of 4294967295ns for timecounter(9)

It happened again a little while later:

[ 462019.2340306] WARNING: hardclock jumped past timecounter max 545582329112112ns (545761131143311 -> 178802031199), exceeding maximum of 4294967295ns for timecounter(9)
[[ ... repeating with adjustments until finally ... ]]
[ 462019.2560543] WARNING: hardclock jumped past timecounter max 545756329112112ns (545761153683767 -> 4824571655), exceeding maximum of 4294967295ns for timecounter(9)

# vmstat -e | fgrep xen | fgrep -v xenev0
vcpu0 xen missed hardclock                                 6862    0 intr
vcpu0 xen local_ns one tick or more behind global_ns          3    0 intr
vcpu0 xen global_ns prevented from running backwards    3870125    6 intr
vcpu1 xen missed hardclock                                 1771    0 intr
vcpu1 xen local_ns one tick or more behind global_ns          1    0 intr
vcpu1 xen global_ns prevented from running backwards    4417397    7 intr
vcpu2 xen missed hardclock                                 2460    0 intr
vcpu2 xen local_ns one tick or more behind global_ns          1    0 intr
vcpu2 xen global_ns prevented from running backwards    3436860    5 intr
vcpu3 xen missed hardclock                                59699    0 intr
vcpu3 xen global_ns prevented from running backwards    4839043    8 intr
vcpu3 xen hardclock jumped past timecounter max              53    0 intr
vcpu4 xen missed hardclock                                 2423    0 intr
vcpu4 xen global_ns prevented from running backwards    5873808    9 intr
vcpu5 xen missed hardclock                               181709    0 intr
vcpu5 xen global_ns prevented from running backwards    4910426    8 intr
vcpu5 xen hardclock jumped past timecounter max             175    0 intr
vcpu6 xen missed hardclock                                39066    0 intr
vcpu6 xen local_ns one tick or more behind global_ns          1    0 intr
vcpu6 xen global_ns prevented from running backwards    5071788    8 intr
vcpu6 xen hardclock jumped past timecounter max              32    0 intr
vcpu7 xen missed hardclock                                 2672    0 intr
vcpu7 xen local_ns one tick or more behind global_ns          1    0 intr
vcpu7 xen global_ns prevented from running backwards    4037069    6 intr

					Greg A. Woods <gwoods%acm.org@localhost>

Kelowna, BC     +1 250 762-7675           RoboHack <woods%robohack.ca@localhost>
Planix, Inc. <woods%planix.com@localhost>     Avoncote Farms <woods%avoncote.ca@localhost>

Attachment: pgp9o0mCL0dG0.pgp
Description: OpenPGP Digital Signature

Home | Main Index | Thread Index | Old Index