Re: timekeeping regression?

To: Brian Buhrow <buhrow%nfbcal.org@localhost>
Subject: Re: timekeeping regression?
From: "Greg A. Woods" <woods%planix.ca@localhost>
Date: Sat, 08 Jun 2024 09:07:41 -0700
First off, one new data point:

tsc_mode="native" is definitely problematic.  Within only a matter of
hours a NetBSD PV domU using that setting has wandered out of time by a
few hundred seconds.  Almost immediately it's ntpd was complaining:

Jun  7 17:47:16 nbtest ntpd[845]: ntpd 4.2.8p14-o Mon May 25 15:53:34 EDT 2020 (import): Starting
Jun  7 17:47:16 nbtest ntpd[845]: Command line: /usr/sbin/ntpd -p /var/run/ntpd.pid -g
Jun  7 17:47:16 nbtest ntpd[845]: ----------------------------------------------------
Jun  7 17:47:16 nbtest ntpd[845]: ntp-4 is maintained by Network Time Foundation,
Jun  7 17:47:16 nbtest ntpd[845]: Inc. (NTF), a non-profit 501(c)(3) public-benefit
Jun  7 17:47:16 nbtest ntpd[845]: corporation.  Support and training for ntp-4 are
Jun  7 17:47:16 nbtest ntpd[845]: available at https://www.nwtime.org/support
Jun  7 17:47:16 nbtest ntpd[845]: ----------------------------------------------------
Jun  7 17:47:16 nbtest ntpd[698]: proto: precision = 0.821 usec (-20)
Jun  7 17:47:16 nbtest ntpd[698]: basedate set to 2021-11-29
Jun  7 17:47:16 nbtest ntpd[698]: gps base set to 2021-12-05 (week 2187)
Jun  7 17:47:16 nbtest ntpd[698]: Listen and drop on 0 v4wildcard 0.0.0.0:123
Jun  7 17:47:16 nbtest ntpd[698]: Listen normally on 1 xennet0 10.0.1.143:123
Jun  7 17:47:16 nbtest ntpd[698]: Listen normally on 2 lo0 127.0.0.1:123
Jun  7 17:47:16 nbtest ntpd[698]: Listening on routing socket on fd #23 for interface updates
Jun  7 17:47:16 nbtest ntpd[698]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
Jun  7 17:47:16 nbtest ntpd[698]: 0.0.0.0 c01d 0d kern kernel time sync enabled
Jun  7 17:47:16 nbtest ntpd[698]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
Jun  7 17:47:16 nbtest ntpd[698]: 0.0.0.0 c012 02 freq_set kernel 60.507 PPM
Jun  7 17:47:16 nbtest ntpd[698]: 0.0.0.0 c016 06 restart
Jun  7 17:47:16 nbtest ntpd[698]: DNS ntp.local -> 10.0.1.140
Jun  7 17:47:16 nbtest ntpd[698]: 10.0.1.140 8011 81 mobilize assoc 48502
Jun  7 17:50:34 nbtest ntpd[698]: 10.0.1.140 8014 84 reachable
Jun  7 17:53:51 nbtest ntpd[698]: 10.0.1.140 901a 8a sys_peer
Jun  7 17:53:51 nbtest ntpd[698]: 0.0.0.0 c615 05 clock_sync
Jun  7 18:24:25 nbtest ntpd[698]: 0.0.0.0 0613 03 spike_detect -0.167483 s
Jun  7 18:26:35 nbtest ntpd[698]: 0.0.0.0 061c 0c clock_step -0.192934 s
Jun  7 18:26:35 nbtest ntpd[698]: 0.0.0.0 0615 05 clock_sync
Jun  7 18:26:35 nbtest /netbsd: [ 2397.7142877] Time stepped from 1717809995.619809967 to 1717809995.426869000
Jun  7 18:27:40 nbtest ntpd[698]: 0.0.0.0 c618 08 no_sys_peer
Jun  7 18:27:40 nbtest ntpd[698]: 10.0.1.140 8014 84 reachable
Jun  7 18:36:21 nbtest ntpd[698]: 10.0.1.140 901a 8a sys_peer
Jun  7 18:36:21 nbtest ntpd[698]: 0.0.0.0 c613 03 spike_detect -0.148030 s
Jun  7 18:38:33 nbtest ntpd[698]: 0.0.0.0 c61c 0c clock_step -0.198244 s
Jun  7 18:38:33 nbtest ntpd[698]: 0.0.0.0 c615 05 clock_sync
Jun  7 18:38:33 nbtest /netbsd: [ 3115.9076988] Time stepped from 1717810713.620280185 to 1717810713.422028000
Jun  7 18:39:40 nbtest ntpd[698]: 0.0.0.0 c618 08 no_sys_peer
Jun  7 18:39:40 nbtest ntpd[698]: 10.0.1.140 8014 84 reachable
Jun  7 18:48:25 nbtest ntpd[698]: 10.0.1.140 901a 8a sys_peer
Jun  7 18:48:25 nbtest ntpd[698]: 0.0.0.0 c613 03 spike_detect -0.379020 s
Jun  7 18:49:31 nbtest ntpd[698]: 0.0.0.0 c61c 0c clock_step -0.514984 s
Jun  7 18:49:31 nbtest ntpd[698]: 0.0.0.0 c615 05 clock_sync
Jun  7 18:49:31 nbtest /netbsd: [ 3774.1063171] Time stepped from 1717811371.620646193 to 1717811371.105652000
Jun  7 18:50:35 nbtest ntpd[698]: 0.0.0.0 c618 08 no_sys_peer

[[ ... and so on with spikes up to 81s that it recovered from, until ... ]]

Jun  8 14:06:34 nbtest ntpd[698]: 10.0.1.140 901a 8a sys_peer
Jun  8 14:06:34 nbtest ntpd[698]: 0.0.0.0 c613 03 spike_detect -234.216826 s
Jun  8 14:08:42 nbtest ntpd[698]: 0.0.0.0 c618 08 no_sys_peer


This is somewhat more dramatic than the behaviour of domUs running with
tsc_mode=default (with my CPUs this should be the same as always_emulate).
I don't see ntpd trying to adjust the clock -- I think it drifts/jumps
too far too suddenly.

Anyway, I expected this, and I'm glad to see it confirmed.

Another domU on the same machine with tsc_mode=always_emulate is still
keeping good time.

So, if your CPU doesn't have TSC-INVARIANT you need always_emulate (and
if you want to do server migrations then you probably still need it).



At Sat, 8 Jun 2024 08:59:29 -0700, Brian Buhrow <buhrow%nfbcal.org@localhost> wrote:
Subject: Re: timekeeping regression?
>
> 	Hello.  In following this thread, I have some questions which
> may or may not be helpful in pinpointing the problem.  If I remember
> the details of the thread correctly, Greg is running NetBSD as a dom0
> and as domu's on the same machine.  Domu's which are running FreeBSD
> in HVM mode on the same machine are not exhibiting the time problem,
> even though both the dom0 and domu's running NetBSD are exhibiting the
> behavior.  The domu's exhibiting the behavior are all running in pv
> mode.

That's about right, except for the FreeBSD domU bit....

> 1.  In the dom0, it looks like you can tune the frequency at which
> NetBSD updates Xen's notion of the time through a sysctl variable.
> The variable represents the number of clock ticks on the cpu dom0 is
> using for its time counter, at least, that's what it looks like to me.
> In any case, what frequency is your dom0 using for updating Xen's
> notion of the time?  Do you get better, worse or no change in behavior
> if you change that frequency?  Is it using the same frequency as
> FreeBSD uses?

This shouldn't have any effect, except microscopically for newly created
domUs IFF, like NetBSD, they fetch their initial "real time clock" value
from the Xen hypervisor.  I.e. that setting is just the update frequency
for the Xen kernel's current wall clock time, and should not effect it's
emulation of RDTSC.

> 2.  Do you get the same trouble on domu's running in pvh mode?

Dunno, haven't got that far yet with NetBSD, but I have a fresh 10.0
instance ready to try with PVH once it's run its course of 7-8 days
uptime with tsc_mode=always_emulate to see if it eventually wanders.

I wanted to try PVH/pvshim too, but I don't think I'll bother as I can't
see it making any difference.

I am running FreeBSD-14 in PVH though, and it is working fine.

I don't remember what timecounter was preferred by FreeBSD under HVM.

> 3.  Do domu's with only 1 vcpu exhibit the bad behavior?  There are a
> lot of notes about making sure the cpu doesn't change in the middle of
> time fetching operations in the xen_clock.c file.

I've never tested any domU with only 1 vCPU.  I would think it would
have to be pinned too.

I've only ever tried pinning dom0s CPUs, and I have tried running dom0
with only one vCPU, but neither made any difference, and I'm beginning
to understand it should _not_ make any difference either.  Dom0 doesn't
control the hypervisor's emulation of RDTSC after all.  The only weird
part is that my dom0s never wander in time and yet they're using the
same xen_system_time timecounter and presumably the same RDTSC emulation
by the hypervisor kernel.  :-/

BTW, note also what the comment says in this code from FreeBSD:

	static uint32_t
	xentimer_get_timecount(struct timecounter *tc)
	{
	        uint64_t vcpu_time;

	        /*
	         * We don't disable preemption here because the worst that can
	         * happen is reading the vcpu_info area of a different CPU than
	         * the one we are currently running on, but that would also
	         * return a valid tc (and we avoid the overhead of
	         * critical_{enter/exit} calls).
	         */
	        vcpu_time = xen_fetch_vcpu_time(DPCPU_GET(vcpu_info));

	        return (vcpu_time & UINT32_MAX);
	}

> 4.  What version of xen_clock.c are you running?  It looks like some
> fixes were incorporated into V1.18 to try and address this issue.

Mostly v1.8, with two kernels including the v1.12 change, one being 10.0.

> 	As a data point, I'm running NetBSD-9.9977, from January 2021,
> as both pv and pvh domu's with FreeBSD-13 as the dom0, with Xen-4.16.0
> and I'm not seeing these time issues on the domu's in question.  And,
> that's with xen_clock.c, V1.8.  I'm also running some old NetBSD-5.2
> domu's in pv mode and they're notseeing these time issues either.

I should try an older NetBSD.

If I had more spare hardware, and time, I would try each Xen kernel
release as well....

--
					Greg A. Woods <gwoods%acm.org@localhost>

Kelowna, BC     +1 250 762-7675           RoboHack <woods%robohack.ca@localhost>
Planix, Inc. <woods%planix.com@localhost>     Avoncote Farms <woods%avoncote.ca@localhost>
Attachment: pgpDAVKE2iGMi.pgp
Description: OpenPGP Digital Signature
References:
- Re: timekeeping regression?
  - From: Greg A. Woods
- Re: timekeeping regression?
  - From: Brian Buhrow
Prev by Date: Re: timekeeping regression?
Next by Date: Re: timekeeping regression?
Previous by Thread: Re: timekeeping regression?
Next by Thread: Re: timekeeping regression?
Indexes:
Home | Main Index | Thread Index | Old Index