Port-xen archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Xen timecounter issues
>>>>> On Mon, 24 Jun 2024 10:48:31 +0200, Manuel Bouyer <bouyer%antioche.eu.org@localhost> said:
> On Mon, Jun 24, 2024 at 04:42:19AM -0400, Brad Spencer wrote:
>> Manuel Bouyer <bouyer%antioche.eu.org@localhost> writes:
>>
>> > On Sun, Jun 23, 2024 at 01:58:36PM +0000, Taylor R Campbell wrote:
>> >> It came to my attention today that there has been a lot of discussion
>> >> recently about timecounters on Xen:
>> >>
>> >> https://mail-index.netbsd.org/port-xen/2024/02/01/msg010525.html
>> >> https://mail-index.netbsd.org/port-xen/2024/06/20/msg010573.html
>> >>
>> >> These threads are long and I wasn't following because I'm not
>> >> subscribed to port-xen, but since I wrote xen_clock.c and I remember
>> >> roughly how it works, maybe my input might be helpful. Is there a
>> >> summary of the issues?
>> >>
>> >> 1. Is there an array of the following variables?
>> >>
>> >> - dom0 kernel (netbsd-8, netbsd-9, netbsd-10, linux, freebsd, ...)
>> >> - domU kernel, if misbehaviour observed in domU (ditto)
>> >> - Xen kernel version
>> >> - virtualization type (pv/pvh/hvm/...)
>> >> - Xen TSC configuration
>> >> - physical CPU (and, whether the physical CPU has invariant TSC)
>> >> - misbehaviour summary
>> >
>> > AFAIK no. From what I understood the misbehavior is only seen in dom0.
>> > All I can say is that I've run NetBSD Xen dom0 on various generation of
>> > Intel CPUs (from P4 to quite recent Xeon) and I never had any issue with
>> > timekeeping in dom0 (all my dom0 runs ntpd)
>>
>> Another factor might be the number of vcpus allocated to Domain-0. I
>> use only 1 and have no trouble with time keeping on two Intel i7/i8
>> systems and one very old AMD Athlon II. One of the other reporters is
>> using more than one vcpu with Domain-0 and is having trouble with time
>> keeping and has found that cpu pining solves the problem. I am also
>> running Xen 4.15 and he is running 4.18 (I believe).
> I'm switching from 4.15 to 4.18, and with netbsd-10 I'm running
> dom0 with all available CPUs (and I have done so on my test machine
> running -current for some time now)
> I don't think the number of vCPUs is the factor here, as even with one vCPU
> it's not pinned to a physical CPU.
The Xen hypervisor pinning logic for dom0 boot time is special-cased for
dom0/PV and pinned to the BSP (which is the only mode we use, last I
checked).
This means, that until we spin up the rest of the AP vCPUs, vcpu0 is
pinned to the underlying BSP. Note that our probe logic only spins up
the underlying number of pcpus - however, without pinning specified, the
additional vCPUs are free to be scheduled onto other pCPUs - which is
what probably triggered the tsc drift that Greg observed on !invariant
TSC h/w - since a sequence of reads are not guaranteed to be made on the
same underlying pCPU. This is a symptom of Xen's poor API abstraction
for h/w resource sharing on dom0 - I think this was fixed later for
newer modes, but I'm not up to date on that.
Here's the relevant code snippet:
xen/common/sched/core.c:sched_init_vcpu()
...
else if ( pv_shim && v->vcpu_id == 0 )
{
/*
* PV-shim: vcpus are pinned 1:1. Initially only 1 cpu is online,
* others will be dealt with when onlining them. This avoids pinning
* a vcpu to a not yet online cpu here.
*/
sched_set_affinity(unit, cpumask_of(0), cpumask_of(0));
}
else if ( d->domain_id == 0 && opt_dom0_vcpus_pin )
{
/*
* If dom0_vcpus_pin is specified, dom0 vCPUs are pinned 1:1 to
* their respective pCPUs too.
*/
sched_set_affinity(unit, cpumask_of(processor), &cpumask_all);
}
--
Math/(~cherry)
Home |
Main Index |
Thread Index |
Old Index