Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: timekeeping regression?



At Wed, 19 Jun 2024 15:34:10 -0700, "Greg A. Woods" <woods%planix.ca@localhost> wrote:
Subject: Re: timekeeping regression?
>
> After all it does have to have something to do with the Xen hypervisor
> as I didn't see time warps in domUs either until I upgraded to 4.18.

So, like clockwork, my one machine that has now been running again since
reboot for nearly 8 days has exhibited time drift in its NetBSD domUs.

This is the uptime of the dom0:

xentastic          up   7+20:55,     0 users,  load 0.00, 0.00, 0.00

It runs the following domUs:

Name                            ID   Mem VCPUs      State   Time(s)
Domain-0                         0 12288     8     r-----   23004.7
fezzik                           2  3936     4     -b----    2870.7
nb10                             3  2000     4     -b----    2812.4
nbtest                           9  2000     4     -b----   10413.1

fezzik             up   7+20:56,     1 user,   load 0.79, 0.78, 0.75
nb10               up   7+20:51,     0 users,  load 0.01, 0.04, 0.00
nbtest             up   6+23:08,     0 users,  load 0.04, 0.04, 0.00

Nb10 is stock NetBSD-10.0 (XEN3_DOMU).

Fezzik, the FreeBSD-14 domU, is running A-OK with near-perfect time.

Note that nbtest, running an ancient -current but with xen_clock.c:1.18,
was rebooted after the machine was brought up and still hasn't exceeded
the magic ~7.5 days of uptime, yet it is experiencing time drift.

Nbtest doesn't show anything really new in the event counters -- they
are still running at about the same rates as before the time skew.

vcpu0 missed hardclock                                421557923  700 intr
vcpu0 timecounter went backwards                       42769579   71 intr
vcpu1 missed hardclock                                260820284  433 intr
vcpu1 timecounter went backwards                         395662    0 intr
vcpu2 missed hardclock                                244147923  405 intr
vcpu2 timecounter went backwards                         359912    0 intr
vcpu3 missed hardclock                                262151092  435 intr
vcpu3 timecounter went backwards                         329019    0 intr


The only new clue here is that the domU uptime doesn't seem to matter --
it's the whole machine's uptime that matters.  The two NetBSD domUs with
about 21 hours difference in uptime were only about 30 minutes apart in
when mDNSResponder first detected time skew:

Jun 24 07:11:50 nb10 mDNSResponder: mDNS_Execute,5348: mDNSPlatformRawTime went backwards by 758 ticks; setting correction factor to 751771850

Jun 24 08:45:53 nbtest mDNSResponder: mDNSCoreReceive,10552: mDNSPlatformRawTime went backwards by 370 ticks; setting correction factor to 3970002857


Note mdnsd is currently using gettimeofday(), though it does have code
to use clock_gettime() with CLOCK_MONOTONIC.  In any case as-is it is
correctly observing the time-of-day skew.


Those "timecounter went backwards" events, especially at a relatively
high and constant rate on vcpu0, are possibly concerning.

Something I may have mentioned before....  In FreeBSD they have a much
simpler tc_get_timecounter implementation with the following comment

	/*
	 * We don't disable preemption here because the worst that can
	 * happen is reading the vcpu_info area of a different CPU than
	 * the one we are currently running on, but that would also
	 * return a valid tc (and we avoid the overhead of
	 * critical_{enter/exit} calls).
	 */

--
					Greg A. Woods <gwoods%acm.org@localhost>

Kelowna, BC     +1 250 762-7675           RoboHack <woods%robohack.ca@localhost>
Planix, Inc. <woods%planix.com@localhost>     Avoncote Farms <woods%avoncote.ca@localhost>

Attachment: pgpeTBodJoa1q.pgp
Description: OpenPGP Digital Signature



Home | Main Index | Thread Index | Old Index