At Wed, 19 Jun 2024 15:34:10 -0700, "Greg A. Woods" <woods%planix.ca@localhost> wrote: Subject: Re: timekeeping regression? > > After all it does have to have something to do with the Xen hypervisor > as I didn't see time warps in domUs either until I upgraded to 4.18. So, like clockwork, my one machine that has now been running again since reboot for nearly 8 days has exhibited time drift in its NetBSD domUs. This is the uptime of the dom0: xentastic up 7+20:55, 0 users, load 0.00, 0.00, 0.00 It runs the following domUs: Name ID Mem VCPUs State Time(s) Domain-0 0 12288 8 r----- 23004.7 fezzik 2 3936 4 -b---- 2870.7 nb10 3 2000 4 -b---- 2812.4 nbtest 9 2000 4 -b---- 10413.1 fezzik up 7+20:56, 1 user, load 0.79, 0.78, 0.75 nb10 up 7+20:51, 0 users, load 0.01, 0.04, 0.00 nbtest up 6+23:08, 0 users, load 0.04, 0.04, 0.00 Nb10 is stock NetBSD-10.0 (XEN3_DOMU). Fezzik, the FreeBSD-14 domU, is running A-OK with near-perfect time. Note that nbtest, running an ancient -current but with xen_clock.c:1.18, was rebooted after the machine was brought up and still hasn't exceeded the magic ~7.5 days of uptime, yet it is experiencing time drift. Nbtest doesn't show anything really new in the event counters -- they are still running at about the same rates as before the time skew. vcpu0 missed hardclock 421557923 700 intr vcpu0 timecounter went backwards 42769579 71 intr vcpu1 missed hardclock 260820284 433 intr vcpu1 timecounter went backwards 395662 0 intr vcpu2 missed hardclock 244147923 405 intr vcpu2 timecounter went backwards 359912 0 intr vcpu3 missed hardclock 262151092 435 intr vcpu3 timecounter went backwards 329019 0 intr The only new clue here is that the domU uptime doesn't seem to matter -- it's the whole machine's uptime that matters. The two NetBSD domUs with about 21 hours difference in uptime were only about 30 minutes apart in when mDNSResponder first detected time skew: Jun 24 07:11:50 nb10 mDNSResponder: mDNS_Execute,5348: mDNSPlatformRawTime went backwards by 758 ticks; setting correction factor to 751771850 Jun 24 08:45:53 nbtest mDNSResponder: mDNSCoreReceive,10552: mDNSPlatformRawTime went backwards by 370 ticks; setting correction factor to 3970002857 Note mdnsd is currently using gettimeofday(), though it does have code to use clock_gettime() with CLOCK_MONOTONIC. In any case as-is it is correctly observing the time-of-day skew. Those "timecounter went backwards" events, especially at a relatively high and constant rate on vcpu0, are possibly concerning. Something I may have mentioned before.... In FreeBSD they have a much simpler tc_get_timecounter implementation with the following comment /* * We don't disable preemption here because the worst that can * happen is reading the vcpu_info area of a different CPU than * the one we are currently running on, but that would also * return a valid tc (and we avoid the overhead of * critical_{enter/exit} calls). */ -- Greg A. Woods <gwoods%acm.org@localhost> Kelowna, BC +1 250 762-7675 RoboHack <woods%robohack.ca@localhost> Planix, Inc. <woods%planix.com@localhost> Avoncote Farms <woods%avoncote.ca@localhost>
Attachment:
pgpeTBodJoa1q.pgp
Description: OpenPGP Digital Signature