tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: PHP performance on Xen domU with mulitple vcpu



At Sun, 6 Apr 2025 00:15:18 +0000, Emmanuel Dreyfus <manu%netbsd.org@localhost> wrote:
Subject: Re: PHP performance on Xen domU with mulitple vcpu
>
> On Sat, Apr 05, 2025 at 04:02:28PM -0700, Greg A. Woods wrote:
> > > Indeed, this is 4.3.1
> > I'm not sure I understand that number.
>
> Sorry, I am not sure how I managed to write that. I meant 4.18.3
> xen_version            : 4.18.3_20240909nb1

Ah, thank you!  That makes so much more sense!

> > That's what I meant -- that with clockinterrupt then ntpd is able to
> > keep the system wallclock time in sync vs. falling out of sync when
> > using xen_system_time.
>
> Yes, this is what happens.

Hmmm....  I've got a few more days to go before my test system has an
uptime long enough to tell if clockinterrupt makes any difference.

I do note that the system is less responsive with that setting.

> > One other question I forgot to ask:  With xen_system_time does
> > timekeeping work OK for the first few days and then go bad, or is it bad
> > right from boot?
>
> ntpq -c kerninfo shows maximum error steadily rising from 16000 by 0.5
> each second.  That starts at boot time, and it never syncs.
>
> # while true; do ntpq -c kerninfo|grep "maximum error"; sleep 1; done
> maximum error:         16326.5
> maximum error:         16327
> maximum error:         16327.5
> maximum error:         16328
> maximum error:         16328.5
> maximum error:         16329

Hmmm... Interesting.

I'm not sure I really understand "maximum error" -- I'm also not sure it
is relevant.

The documentation says:

	5. The synchronization distance, defined as one-half the delay
	plus the dispersion, represents the maximum error statistic.
	The jitter represents the expected error statistic.  The maximum
	error and expected error calculated from the peer variables
	represents the quality metric for the server.  The maximum error
	and expected error calculated from the system variables
	represents the quality metric for the client.

So it's a quality metric.  But I don't really understand what it means
or implies.

I see the same deltas in maxerror as you, both on machines running
without any problem, as well as on the machine I currently have with the
problem; but none of my machines have such large initial numbers -- at
least not after some uptime.  Perhaps your initial value for maxerror is
larger because your system has a larger timestep when it boots?  Mine
was about 27 seconds on the last boot and my maxerror is still under
500.

I think the only way maxerror can change after boot is if something
(ntpd) is calling the ntp_adjtime() system call with MOD_MAXERROR set.

So if that's right then a changing maxerror means ntpd is constantly
providing a new maxerror value to store.

Indeed in ntpd/ntp_loopfilter.c, just before the all to ntp_adjtime() is
the following:

			ntv.maxerror = usec_long_from_dbl(
				sys_rootdelay / 2 + sys_rootdisp);

(which immediately, on the first call, overwrites the initial maxerror
calculation done in the kernel, I think)

If I'm not mistaken one can query the current rootdelay and rootdisp
with "ntpq -c sysinfo".

The failing Xen-4.20_rc dom0 with an uptime of 5 days:
(also failed with 4.18)

	root delay:         0.045
	root dispersion:    212.300

A working xen-4.13 dom0 with an uptime of 53 days:
(same hardware fails with 4.18)

	root delay:         53.947
	root dispersion:    24.535

A working xen-4.18.0_20231116nb0 dom0 with an uptime of 136 days:

	root delay:         16.979
	root dispersion:    36.968

A non-Xen system with an uptime of 122 days:

	root delay:         64.865
	root dispersion:    44.611

Even on that first machine with the problem, ntpd is able to sync and,
after some time, with some clock adjustments and skips, it will stay in
sync for ~7.5 days.  The first "spike_detect" and clock step is about 4
days after boot, but it keeps in sync right up until time goes really
wonky after ~7.5 days.  For the time being I'm letting the Xen watchdog
reboot the system when time goes wonky.

Maybe the delta in maxerror is just telling us the system clock is
"skewed", i.e. not running at the frequency it claims to run at, which
would not be too surprising.

BTW, I found this:

	https://blog.meinbergglobal.com/2021/02/25/the-root-of-all-timing-understanding-root-delay-and-root-dispersion-in-ntp/

And of course there's also this:

	https://www.ntp.org/documentation/4.2.8-series/stats/#quality-of-service

--
					Greg A. Woods <gwoods%acm.org@localhost>

Kelowna, BC     +1 250 762-7675           RoboHack <woods%robohack.ca@localhost>
Planix, Inc. <woods%planix.com@localhost>     Avoncote Farms <woods%avoncote.ca@localhost>

Attachment: pgpTj1B086QNl.pgp
Description: OpenPGP Digital Signature



Home | Main Index | Thread Index | Old Index