Sometime in the last 8-12 hours something went sideways with my timekeeping again. The NetBSD domUs on my two local Xen servers lost clock sync and ntpd is no longer able to keep them in line. Rebooting one of them didn't help. It started drifting far before ntpd could even get well enough connected to its server. Oddly the dom0s on both machines are both still keeping good time. To me this seems to suggest the Xen kernel itself has lost clock sync somehow and the timecounter it supplies via xen_system_time is no longer stable, but..... At Fri, 08 Mar 2024 06:56:49 +0000, "Mathew, Cherry G.*" <c%bow.st@localhost> wrote: Subject: Re: timekeeping regression? > > This seems to strengthen my suspicion that our dom0/Xen interaction is a > bit under specified - for eg: dom0 is able to directly access privileged > hardware state such as apic registers, and ownership state isn't > necessarily clear (last I checked, which was a while ago) - so if our > dom0 is assuming a "native" path to reading system clock state, and > there is an existing Xen mechanism to maintain and export canonical > clock state, that's the obvious fix to this problem. I thought dom0 should be keeping its own time via ntpd and, if I understand right, it's updating the Xen kernel's wallclock time periodically. See sys/xen/xen/xen_clock.c:xen_timepush_intr() et al. dom0 is definitely not using any native clock either: # sysctl kern.timecounter.hardware kern.timecounter.hardware = xen_system_time I don't know if there's a relationship between the Xen kernel's wallclock time and what it's supplying as a timecounter via xen_system_time. I would think there shouldn't be, but I dunno. I'm not sure what Xen needs in terms of wallclock time. In theory the timecounter used by dom0 should be exactly the same as the one used by any (NetBSD) domU running on the same machine, i.e. the one supplied by Xen via xen_system_time. I note too that FreeBSD is probably using the same Xen timecounter as its timecounter too: kern.timecounter.hardware: XENTIMER But somehow my domU running FreeBSD is keeping better time -- maybe it's just that it's ntpd has been lucky and been able to keep things in sync. Maybe I'll reboot again and go down to just one pinned vCPU in dom0, at least in one of the servers -- I don't really need that much horsepower in its dom0. -- Greg A. Woods <gwoods%acm.org@localhost> Kelowna, BC +1 250 762-7675 RoboHack <woods%robohack.ca@localhost> Planix, Inc. <woods%planix.com@localhost> Avoncote Farms <woods%avoncote.ca@localhost>
Attachment:
pgpGovlSFIhUd.pgp
Description: OpenPGP Digital Signature