Port-xen archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: xennet performance collapses with multiple vCPU
Hi,
On Mon, Apr 13, 2020 at 01:44:10PM +0200, Manuel Bouyer wrote:
> On Sun, Apr 05, 2020 at 08:57:45PM +0000, Andrew Doran wrote:
> > > [...]
> > > I've now tracked it down to this change:
> > >
> > > Module Name: src
> > > Committed By: ad
> > > Date: Mon Jan 13 20:30:08 UTC 2020
> > >
> > > Modified Files:
> > > src/sys/kern: subr_cpu.c
> > >
> > > Log Message:
> > > Fix some more bugs in the topo stuff, that prevented it from working
> > > properly with fake topo info + MP.
> > >
> > >
> > > To generate a diff of this commit:
> > > cvs rdiff -u -r1.10 -r1.11 src/sys/kern/subr_cpu.c
> > >
> > > After this change the DomU even boots visibly slower. Maybe this change
> > > makes MP system scheduler use all CPUs, but introduces too much switching
> > > between them? Andy, can you have a look?
> > >
> > > I'll meanwhile check if there is anything obvious in the fake topology code.
> >
> > I spent some time looking into this over the weekend. It's easily
> > reproducible, and I don't see anything that looks strange on the various
> > systems involved. I also don't see why it would be related to the
> > scheduler.
>
> Hello,
> some more data on this issue. With ping I can notice a consistent 10ms delay:
> PING nephtys.lip6.fr (195.83.118.1): 56 data bytes
> 64 bytes from 195.83.118.1: icmp_seq=0 ttl=253 time=8.964715 ms
> 64 bytes from 195.83.118.1: icmp_seq=1 ttl=253 time=10.080450 ms
> 64 bytes from 195.83.118.1: icmp_seq=2 ttl=253 time=10.079291 ms
> 64 bytes from 195.83.118.1: icmp_seq=3 ttl=253 time=10.079525 ms
> 64 bytes from 195.83.118.1: icmp_seq=4 ttl=253 time=10.083389 ms
> 64 bytes from 195.83.118.1: icmp_seq=5 ttl=253 time=10.080444 ms
> 64 bytes from 195.83.118.1: icmp_seq=6 ttl=253 time=10.079615 ms
> 64 bytes from 195.83.118.1: icmp_seq=7 ttl=253 time=10.081661 ms
>
> Sometimes it drops and stays at 5ms.
>
> With a single CPU, the RTT is less than one millisecond.
> Keeping both CPUs busy wit a while(1) loop doesn't help.
>
> It looks like something is delayed to the next clock tick.
> Note that the dom0 is idle and no other VMs are running.
>
> I'm seeing the same behavior in the bouyer-xenpvh branch, where Xen
> now has fast softints and kpreempt. Disabling the later, or both, doens't
> change anything. I'm also seeing the same with a kernel from
> bouyer-xenpvh-base so it's not related to changes in the branch.
>
> Any idea welcome.
This confirms my suspicion and is why I wanted to play with HZ. I think
soft interrupt processing is being driven off the clock interrupt. Maybe
hardware interrupt processing but I think that's less likely. Native x86
configured the same way does not behave like this. Hmm. I will play around
with it some more.
Andrew
Home |
Main Index |
Thread Index |
Old Index