First off, one new data point: tsc_mode="native" is definitely problematic. Within only a matter of hours a NetBSD PV domU using that setting has wandered out of time by a few hundred seconds. Almost immediately it's ntpd was complaining: Jun 7 17:47:16 nbtest ntpd[845]: ntpd 4.2.8p14-o Mon May 25 15:53:34 EDT 2020 (import): Starting Jun 7 17:47:16 nbtest ntpd[845]: Command line: /usr/sbin/ntpd -p /var/run/ntpd.pid -g Jun 7 17:47:16 nbtest ntpd[845]: ---------------------------------------------------- Jun 7 17:47:16 nbtest ntpd[845]: ntp-4 is maintained by Network Time Foundation, Jun 7 17:47:16 nbtest ntpd[845]: Inc. (NTF), a non-profit 501(c)(3) public-benefit Jun 7 17:47:16 nbtest ntpd[845]: corporation. Support and training for ntp-4 are Jun 7 17:47:16 nbtest ntpd[845]: available at https://www.nwtime.org/support Jun 7 17:47:16 nbtest ntpd[845]: ---------------------------------------------------- Jun 7 17:47:16 nbtest ntpd[698]: proto: precision = 0.821 usec (-20) Jun 7 17:47:16 nbtest ntpd[698]: basedate set to 2021-11-29 Jun 7 17:47:16 nbtest ntpd[698]: gps base set to 2021-12-05 (week 2187) Jun 7 17:47:16 nbtest ntpd[698]: Listen and drop on 0 v4wildcard 0.0.0.0:123 Jun 7 17:47:16 nbtest ntpd[698]: Listen normally on 1 xennet0 10.0.1.143:123 Jun 7 17:47:16 nbtest ntpd[698]: Listen normally on 2 lo0 127.0.0.1:123 Jun 7 17:47:16 nbtest ntpd[698]: Listening on routing socket on fd #23 for interface updates Jun 7 17:47:16 nbtest ntpd[698]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized Jun 7 17:47:16 nbtest ntpd[698]: 0.0.0.0 c01d 0d kern kernel time sync enabled Jun 7 17:47:16 nbtest ntpd[698]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized Jun 7 17:47:16 nbtest ntpd[698]: 0.0.0.0 c012 02 freq_set kernel 60.507 PPM Jun 7 17:47:16 nbtest ntpd[698]: 0.0.0.0 c016 06 restart Jun 7 17:47:16 nbtest ntpd[698]: DNS ntp.local -> 10.0.1.140 Jun 7 17:47:16 nbtest ntpd[698]: 10.0.1.140 8011 81 mobilize assoc 48502 Jun 7 17:50:34 nbtest ntpd[698]: 10.0.1.140 8014 84 reachable Jun 7 17:53:51 nbtest ntpd[698]: 10.0.1.140 901a 8a sys_peer Jun 7 17:53:51 nbtest ntpd[698]: 0.0.0.0 c615 05 clock_sync Jun 7 18:24:25 nbtest ntpd[698]: 0.0.0.0 0613 03 spike_detect -0.167483 s Jun 7 18:26:35 nbtest ntpd[698]: 0.0.0.0 061c 0c clock_step -0.192934 s Jun 7 18:26:35 nbtest ntpd[698]: 0.0.0.0 0615 05 clock_sync Jun 7 18:26:35 nbtest /netbsd: [ 2397.7142877] Time stepped from 1717809995.619809967 to 1717809995.426869000 Jun 7 18:27:40 nbtest ntpd[698]: 0.0.0.0 c618 08 no_sys_peer Jun 7 18:27:40 nbtest ntpd[698]: 10.0.1.140 8014 84 reachable Jun 7 18:36:21 nbtest ntpd[698]: 10.0.1.140 901a 8a sys_peer Jun 7 18:36:21 nbtest ntpd[698]: 0.0.0.0 c613 03 spike_detect -0.148030 s Jun 7 18:38:33 nbtest ntpd[698]: 0.0.0.0 c61c 0c clock_step -0.198244 s Jun 7 18:38:33 nbtest ntpd[698]: 0.0.0.0 c615 05 clock_sync Jun 7 18:38:33 nbtest /netbsd: [ 3115.9076988] Time stepped from 1717810713.620280185 to 1717810713.422028000 Jun 7 18:39:40 nbtest ntpd[698]: 0.0.0.0 c618 08 no_sys_peer Jun 7 18:39:40 nbtest ntpd[698]: 10.0.1.140 8014 84 reachable Jun 7 18:48:25 nbtest ntpd[698]: 10.0.1.140 901a 8a sys_peer Jun 7 18:48:25 nbtest ntpd[698]: 0.0.0.0 c613 03 spike_detect -0.379020 s Jun 7 18:49:31 nbtest ntpd[698]: 0.0.0.0 c61c 0c clock_step -0.514984 s Jun 7 18:49:31 nbtest ntpd[698]: 0.0.0.0 c615 05 clock_sync Jun 7 18:49:31 nbtest /netbsd: [ 3774.1063171] Time stepped from 1717811371.620646193 to 1717811371.105652000 Jun 7 18:50:35 nbtest ntpd[698]: 0.0.0.0 c618 08 no_sys_peer [[ ... and so on with spikes up to 81s that it recovered from, until ... ]] Jun 8 14:06:34 nbtest ntpd[698]: 10.0.1.140 901a 8a sys_peer Jun 8 14:06:34 nbtest ntpd[698]: 0.0.0.0 c613 03 spike_detect -234.216826 s Jun 8 14:08:42 nbtest ntpd[698]: 0.0.0.0 c618 08 no_sys_peer This is somewhat more dramatic than the behaviour of domUs running with tsc_mode=default (with my CPUs this should be the same as always_emulate). I don't see ntpd trying to adjust the clock -- I think it drifts/jumps too far too suddenly. Anyway, I expected this, and I'm glad to see it confirmed. Another domU on the same machine with tsc_mode=always_emulate is still keeping good time. So, if your CPU doesn't have TSC-INVARIANT you need always_emulate (and if you want to do server migrations then you probably still need it). At Sat, 8 Jun 2024 08:59:29 -0700, Brian Buhrow <buhrow%nfbcal.org@localhost> wrote: Subject: Re: timekeeping regression? > > Hello. In following this thread, I have some questions which > may or may not be helpful in pinpointing the problem. If I remember > the details of the thread correctly, Greg is running NetBSD as a dom0 > and as domu's on the same machine. Domu's which are running FreeBSD > in HVM mode on the same machine are not exhibiting the time problem, > even though both the dom0 and domu's running NetBSD are exhibiting the > behavior. The domu's exhibiting the behavior are all running in pv > mode. That's about right, except for the FreeBSD domU bit.... > 1. In the dom0, it looks like you can tune the frequency at which > NetBSD updates Xen's notion of the time through a sysctl variable. > The variable represents the number of clock ticks on the cpu dom0 is > using for its time counter, at least, that's what it looks like to me. > In any case, what frequency is your dom0 using for updating Xen's > notion of the time? Do you get better, worse or no change in behavior > if you change that frequency? Is it using the same frequency as > FreeBSD uses? This shouldn't have any effect, except microscopically for newly created domUs IFF, like NetBSD, they fetch their initial "real time clock" value from the Xen hypervisor. I.e. that setting is just the update frequency for the Xen kernel's current wall clock time, and should not effect it's emulation of RDTSC. > 2. Do you get the same trouble on domu's running in pvh mode? Dunno, haven't got that far yet with NetBSD, but I have a fresh 10.0 instance ready to try with PVH once it's run its course of 7-8 days uptime with tsc_mode=always_emulate to see if it eventually wanders. I wanted to try PVH/pvshim too, but I don't think I'll bother as I can't see it making any difference. I am running FreeBSD-14 in PVH though, and it is working fine. I don't remember what timecounter was preferred by FreeBSD under HVM. > 3. Do domu's with only 1 vcpu exhibit the bad behavior? There are a > lot of notes about making sure the cpu doesn't change in the middle of > time fetching operations in the xen_clock.c file. I've never tested any domU with only 1 vCPU. I would think it would have to be pinned too. I've only ever tried pinning dom0s CPUs, and I have tried running dom0 with only one vCPU, but neither made any difference, and I'm beginning to understand it should _not_ make any difference either. Dom0 doesn't control the hypervisor's emulation of RDTSC after all. The only weird part is that my dom0s never wander in time and yet they're using the same xen_system_time timecounter and presumably the same RDTSC emulation by the hypervisor kernel. :-/ BTW, note also what the comment says in this code from FreeBSD: static uint32_t xentimer_get_timecount(struct timecounter *tc) { uint64_t vcpu_time; /* * We don't disable preemption here because the worst that can * happen is reading the vcpu_info area of a different CPU than * the one we are currently running on, but that would also * return a valid tc (and we avoid the overhead of * critical_{enter/exit} calls). */ vcpu_time = xen_fetch_vcpu_time(DPCPU_GET(vcpu_info)); return (vcpu_time & UINT32_MAX); } > 4. What version of xen_clock.c are you running? It looks like some > fixes were incorporated into V1.18 to try and address this issue. Mostly v1.8, with two kernels including the v1.12 change, one being 10.0. > As a data point, I'm running NetBSD-9.9977, from January 2021, > as both pv and pvh domu's with FreeBSD-13 as the dom0, with Xen-4.16.0 > and I'm not seeing these time issues on the domu's in question. And, > that's with xen_clock.c, V1.8. I'm also running some old NetBSD-5.2 > domu's in pv mode and they're notseeing these time issues either. I should try an older NetBSD. If I had more spare hardware, and time, I would try each Xen kernel release as well.... -- Greg A. Woods <gwoods%acm.org@localhost> Kelowna, BC +1 250 762-7675 RoboHack <woods%robohack.ca@localhost> Planix, Inc. <woods%planix.com@localhost> Avoncote Farms <woods%avoncote.ca@localhost>
Attachment:
pgpDAVKE2iGMi.pgp
Description: OpenPGP Digital Signature