At Fri, 14 Feb 2025 13:58:38 -0800, "Greg A. Woods" <woods%planix.ca@localhost> wrote: Subject: Re: timekeeping regression > > The problem begins somewhere between Xen-4.11 and Xen-4.18 (probably > since 4.13 actually, though that'll be proven in about a week's time). > I've reinstalled 4.11 on one of the machines and it has been running > with a stable clock for nearly 15 days now, and I recall previously > having it run for much longer without problem. So, yes, 4.13 is indeed continuing to keep good time well after 7.5 days of uptime. I guess I should try 4.15. BTW, Andrew Cooper mentioned to me on xen-devel: Time handling is a known swamp. I can believe something has changed since 4.13, but I wouldn't say it was working back then either. It still has/had a high rate of "skew" between TSC counters on different vCPUs that shows up as hits in the/my new xen_clock.c's algorithm for keeping global_ns (xen_global_systime_ns_stamp, used for the xen_system_time timecounter) from going backwards: # vmstat -e | fgrep xen | fgrep -v xenev0 vcpu0 xen missed hardclock 538894 0 intr vcpu0 xen global_ns prevented from running backwards 45972839 66 intr vcpu1 xen missed hardclock 989203 1 intr vcpu1 xen global_ns prevented from running backwards 71521339 104 intr One of the domUs (running with a XEN_CLOCK_DEBUG kernel) has reported a slew of quite odd "hardclock jumped past timecounter max" events, something I've never seen before. Its system clock, with ntpd, is still keeping very good time though. [ 461875.9597960] WARNING: hardclock jumped past timecounter max 545582329112350ns (545617865148933 -> 35536036583), exceeding maximum of 4294967295ns for timecounter(9) [[ ... repeating with adjustments until finally ... ]] [ 461897.6710341] WARNING: hardclock jumped past timecounter max 545634329112112ns (545639575735115 -> 5246623003), exceeding maximum of 4294967295ns for timecounter(9) It happened again a little while later: [ 462019.2340306] WARNING: hardclock jumped past timecounter max 545582329112112ns (545761131143311 -> 178802031199), exceeding maximum of 4294967295ns for timecounter(9) [[ ... repeating with adjustments until finally ... ]] [ 462019.2560543] WARNING: hardclock jumped past timecounter max 545756329112112ns (545761153683767 -> 4824571655), exceeding maximum of 4294967295ns for timecounter(9) # vmstat -e | fgrep xen | fgrep -v xenev0 vcpu0 xen missed hardclock 6862 0 intr vcpu0 xen local_ns one tick or more behind global_ns 3 0 intr vcpu0 xen global_ns prevented from running backwards 3870125 6 intr vcpu1 xen missed hardclock 1771 0 intr vcpu1 xen local_ns one tick or more behind global_ns 1 0 intr vcpu1 xen global_ns prevented from running backwards 4417397 7 intr vcpu2 xen missed hardclock 2460 0 intr vcpu2 xen local_ns one tick or more behind global_ns 1 0 intr vcpu2 xen global_ns prevented from running backwards 3436860 5 intr vcpu3 xen missed hardclock 59699 0 intr vcpu3 xen global_ns prevented from running backwards 4839043 8 intr vcpu3 xen hardclock jumped past timecounter max 53 0 intr vcpu4 xen missed hardclock 2423 0 intr vcpu4 xen global_ns prevented from running backwards 5873808 9 intr vcpu5 xen missed hardclock 181709 0 intr vcpu5 xen global_ns prevented from running backwards 4910426 8 intr vcpu5 xen hardclock jumped past timecounter max 175 0 intr vcpu6 xen missed hardclock 39066 0 intr vcpu6 xen local_ns one tick or more behind global_ns 1 0 intr vcpu6 xen global_ns prevented from running backwards 5071788 8 intr vcpu6 xen hardclock jumped past timecounter max 32 0 intr vcpu7 xen missed hardclock 2672 0 intr vcpu7 xen local_ns one tick or more behind global_ns 1 0 intr vcpu7 xen global_ns prevented from running backwards 4037069 6 intr -- Greg A. Woods <gwoods%acm.org@localhost> Kelowna, BC +1 250 762-7675 RoboHack <woods%robohack.ca@localhost> Planix, Inc. <woods%planix.com@localhost> Avoncote Farms <woods%avoncote.ca@localhost>
Attachment:
pgp9o0mCL0dG0.pgp
Description: OpenPGP Digital Signature