At Sun, 6 Apr 2025 00:15:18 +0000, Emmanuel Dreyfus <manu%netbsd.org@localhost> wrote: Subject: Re: PHP performance on Xen domU with mulitple vcpu > > On Sat, Apr 05, 2025 at 04:02:28PM -0700, Greg A. Woods wrote: > > > Indeed, this is 4.3.1 > > I'm not sure I understand that number. > > Sorry, I am not sure how I managed to write that. I meant 4.18.3 > xen_version : 4.18.3_20240909nb1 Ah, thank you! That makes so much more sense! > > That's what I meant -- that with clockinterrupt then ntpd is able to > > keep the system wallclock time in sync vs. falling out of sync when > > using xen_system_time. > > Yes, this is what happens. Hmmm.... I've got a few more days to go before my test system has an uptime long enough to tell if clockinterrupt makes any difference. I do note that the system is less responsive with that setting. > > One other question I forgot to ask: With xen_system_time does > > timekeeping work OK for the first few days and then go bad, or is it bad > > right from boot? > > ntpq -c kerninfo shows maximum error steadily rising from 16000 by 0.5 > each second. That starts at boot time, and it never syncs. > > # while true; do ntpq -c kerninfo|grep "maximum error"; sleep 1; done > maximum error: 16326.5 > maximum error: 16327 > maximum error: 16327.5 > maximum error: 16328 > maximum error: 16328.5 > maximum error: 16329 Hmmm... Interesting. I'm not sure I really understand "maximum error" -- I'm also not sure it is relevant. The documentation says: 5. The synchronization distance, defined as one-half the delay plus the dispersion, represents the maximum error statistic. The jitter represents the expected error statistic. The maximum error and expected error calculated from the peer variables represents the quality metric for the server. The maximum error and expected error calculated from the system variables represents the quality metric for the client. So it's a quality metric. But I don't really understand what it means or implies. I see the same deltas in maxerror as you, both on machines running without any problem, as well as on the machine I currently have with the problem; but none of my machines have such large initial numbers -- at least not after some uptime. Perhaps your initial value for maxerror is larger because your system has a larger timestep when it boots? Mine was about 27 seconds on the last boot and my maxerror is still under 500. I think the only way maxerror can change after boot is if something (ntpd) is calling the ntp_adjtime() system call with MOD_MAXERROR set. So if that's right then a changing maxerror means ntpd is constantly providing a new maxerror value to store. Indeed in ntpd/ntp_loopfilter.c, just before the all to ntp_adjtime() is the following: ntv.maxerror = usec_long_from_dbl( sys_rootdelay / 2 + sys_rootdisp); (which immediately, on the first call, overwrites the initial maxerror calculation done in the kernel, I think) If I'm not mistaken one can query the current rootdelay and rootdisp with "ntpq -c sysinfo". The failing Xen-4.20_rc dom0 with an uptime of 5 days: (also failed with 4.18) root delay: 0.045 root dispersion: 212.300 A working xen-4.13 dom0 with an uptime of 53 days: (same hardware fails with 4.18) root delay: 53.947 root dispersion: 24.535 A working xen-4.18.0_20231116nb0 dom0 with an uptime of 136 days: root delay: 16.979 root dispersion: 36.968 A non-Xen system with an uptime of 122 days: root delay: 64.865 root dispersion: 44.611 Even on that first machine with the problem, ntpd is able to sync and, after some time, with some clock adjustments and skips, it will stay in sync for ~7.5 days. The first "spike_detect" and clock step is about 4 days after boot, but it keeps in sync right up until time goes really wonky after ~7.5 days. For the time being I'm letting the Xen watchdog reboot the system when time goes wonky. Maybe the delta in maxerror is just telling us the system clock is "skewed", i.e. not running at the frequency it claims to run at, which would not be too surprising. BTW, I found this: https://blog.meinbergglobal.com/2021/02/25/the-root-of-all-timing-understanding-root-delay-and-root-dispersion-in-ntp/ And of course there's also this: https://www.ntp.org/documentation/4.2.8-series/stats/#quality-of-service -- Greg A. Woods <gwoods%acm.org@localhost> Kelowna, BC +1 250 762-7675 RoboHack <woods%robohack.ca@localhost> Planix, Inc. <woods%planix.com@localhost> Avoncote Farms <woods%avoncote.ca@localhost>
Attachment:
pgpTj1B086QNl.pgp
Description: OpenPGP Digital Signature