Re: timekeeping regression?

To: NetBSD/xen Discussion List <port-xen%NetBSD.org@localhost>
Subject: Re: timekeeping regression?
From: "Greg A. Woods" <woods%planix.ca@localhost>
Date: Tue, 18 Jun 2024 16:36:27 -0700

So, I know now why we want to use "dom0_vcpus_pin=true" w.r.t. timekeeping!

I updated xen_clock.c to 1.18 and turned on XEN_CLOCK_DEBUG (and then
commented out one of the super-noisy device_printf() calls that actually
caused the system to hang) and I started seeing thousands of printfs
like the following, but only on dom0, and only on the one machine where
I didn't have dom0's CPUs pinned.

[ 83329.4245423] xen raw systime + tsc delta went backwards: 82591317579681 > 82591299251748
[ 83329.4245423]  raw_systime_ns=82590641756625
[ 83329.4245423]  tsc_timestamp=233578790859082
[ 83329.4245423]  tsc=233580649104491
[ 83329.4245423]  tsc_to_system_mul=3039340271
[ 83329.4245423]  tsc_shift=-1
[ 83329.4245423]  delta_tsc=1858245409
[ 83329.4245423]  delta_ns=657495123


Make that hundreds of thousands in less than a day:

# uptime
 4:33PM  up 23:26, 2 users, load averages: 0.08, 0.02, 0.01

vcpu0 raw systime went backwards    395276    4 intr
vcpu0 missed hardclock              423534    5 intr
vcpu0 timecounter went backwards    242583    2 intr

vcpu1 raw systime went backwards    261025    3 intr
vcpu1 missed hardclock              462819    5 intr
vcpu1 timecounter went backwards    256918    3 intr


Also time drifted.....

# ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 xentastic.local 192.75.191.16    3 u  105  256  377    0.469  -724851 6055.97



So I pinned them at runtime:

# xl vcpu-list Domain-0
Name                                ID  VCPU   CPU State   Time(s) Affinity (Hard / Soft)
Domain-0                             0     0    3   r--     806.0  all / all
Domain-0                             0     1    2   -b-     715.6  all / all
# xl vcpu-pin 0 0 0
# xl vcpu-pin 0 1 1
# xl vcpu-list Domain-0
Name                                ID  VCPU   CPU State   Time(s) Affinity (Hard / Soft)
Domain-0                             0     0    0   -b-     807.9  0 / all
Domain-0                             0     1    1   r--     716.6  1 / all

And voila!  Instantly no more raw system time going backwards events!

Also ntpd is again able to hold the clock stable again (after a reset
step by ntpdate).


I thought this might be because there's no way (that I know) to set the
tsc_mode for dom0, but given that the tsc_to_system_mul shown in the
debug printf is about what it should be to round down to 1GHz on this
machine then it seems RDTSC must be being emulated.

I guess the RDTSC emulation must not be stable across CPUs?  Or?


Now I wait some days again to see if the newest xen_clock.c gives me any
more clues as to why, if it still happens, that domU clocks begin to
drift after ~7.5 days of uptime.....

--
					Greg A. Woods <gwoods%acm.org@localhost>

Kelowna, BC     +1 250 762-7675           RoboHack <woods%robohack.ca@localhost>
Planix, Inc. <woods%planix.com@localhost>     Avoncote Farms <woods%avoncote.ca@localhost>

Attachment: pgpJQJopn1wCq.pgp
Description: OpenPGP Digital Signature

Follow-Ups:
- Re: timekeeping regression?
  - From: Greg A. Woods
- Re: timekeeping regression?
  - From: Brad Spencer

References:
- Re: timekeeping regression?
  - From: Greg A. Woods
- Re: timekeeping regression?
  - From: Brad Spencer
- Re: timekeeping regression?
  - From: Greg A. Woods

Prev by Date: Re: timekeeping regression?
Next by Date: Re: timekeeping regression?
Previous by Thread: Re: timekeeping regression?
Next by Thread: Re: timekeeping regression?
Indexes:

Home | Main Index | Thread Index | Old Index