Re: timekeeping regression?

To: NetBSD/xen Discussion List <port-xen%NetBSD.org@localhost>
Subject: Re: timekeeping regression?
From: "Greg A. Woods" <woods%planix.ca@localhost>
Date: Mon, 10 Jun 2024 20:45:25 -0700

At Mon, 10 Jun 2024 19:45:07 -0400, Brad Spencer <brad%anduin.eldar.org@localhost> wrote:
Subject: Re: timekeeping regression?
>
> "Greg A. Woods" <woods%planix.ca@localhost> writes:
> >
> > It has recorded a drift value just 50.172, though oddly it is not
> > updating the file hourly like ntp.conf(5) suggests it should be doing.
> > It hasn't written the file since booting.  None of my VMs have drift
> > values over 66.
>
> I suspect it hasn't updated the file because the VM has lost sync with
> the server.  It appears that the drift file on mine is being updated.

Ah, yes, it had not quite sunk in when I was looking at those logs that
clock_sync was not being maintained for more than about a minute (if
even that long -- probably only for the instant when the clock_sync log
entry was generated), which I guess is more or less what could be
expected given the way I (mis)configured this system for testing.

Note that most of the rest of your analysis isn't really meaningful for
this particular system as it is a deliberate test to see what happens
when Xen is _not_ messing with the RDTSC instruction.  (Note though that
on my LAN I only intend to run one NTP server, so LAN clients normally
only ever have this one local server configured, so all of that part is
"normal" and as-intended.)

Anyway, the result with tsc_mode=native is more or less matching what I
would expect to see on bare multi-core hardware with an older CPU (as
this is) if one forced a NetBSD kernel running full SMP to use TSC as
its timecounter source.

It does though also show that NTPd is remarkably persistent at trying to
keep the clock in line if things aren't too wonky, as opposed to the
main problem this thread is about where something suddenly goes far too
wonky for domUs under the more recent Xen versions after 7 to 8 days of
uptime, and where prior to that everything runs perfectly with no hint
of any problem whatsoever.

So I think my conclusion at the moment is that there's something
happening with the RDTSC emulation, at least with tsc_mode=default,
whereby suddenly a value is returned that causes the NetBSD clock to
jump so wildly that NTPd immediately gives up.  Unfortunately I'm not
seeing anything obvious about what happened in the logs from ntpd, nor
its state after the fact, when this occurs.  Given FreeBSD's ability to
withstand this event my current guess is that there's something wrong
with the TSC frequency scaling code in NetBSD, but I'm at a total loss
as to why it fails with only some versions of Xen.

I'm still waiting to see if there's any difference with
tsc_mode=always_emulate.  That is being tested with a stock NetBSD-10
install and with NTPd using the pool servers.  Only thing I forgot to
adjust was ntpd's log levels so I won't see any clock_sync or
no_sys_peer messages.

--
					Greg A. Woods <gwoods%acm.org@localhost>

Kelowna, BC     +1 250 762-7675           RoboHack <woods%robohack.ca@localhost>
Planix, Inc. <woods%planix.com@localhost>     Avoncote Farms <woods%avoncote.ca@localhost>

Attachment: pgpnaxdSzZlCZ.pgp
Description: OpenPGP Digital Signature

Follow-Ups:
- Re: timekeeping regression?
  - From: Greg A. Woods

References:
- Re: timekeeping regression?
  - From: Greg A. Woods
- Re: timekeeping regression?
  - From: Brad Spencer

Prev by Date: Re: timekeeping regression?
Next by Date: Re: timekeeping regression?
Previous by Thread: Re: timekeeping regression?
Next by Thread: Re: timekeeping regression?
Indexes:

Home | Main Index | Thread Index | Old Index