All using NetBSD 9.99.51 (XEN3_DOMU) #0: Sun Mar 22 17:35:29 UTC 2020
Note that increasing the vCPUs from 1 to 4 (2 gives same result as 4)
drops the throughput to about 10%. No offloads configured in domU:
netio same host (1 vCPU both ends):
Packet size 1k bytes: 96509 KByte/s Tx, 92083 KByte/s Rx.
...
netio same host (4 vCPU both ends):
Packet size 1k bytes: 9814 KByte/s Tx, 9443 KByte/s Rx.
FYI using tcp4csum offload on both ends offers only incremental increase, on my system it went from:
Packet size 1k bytes: 94505 KByte/s Tx, 88125 KByte/s Rx.
to:
Packet size 1k bytes: 96703 KByte/s Tx, 93536 KByte/s Rx.
I've tried the same on my system, for starters just two DomU on same physical host. I see exact same ~10x slowdown the moment I change even one of the DomU pair to 2 vCPUs:
Packet size 1k bytes: 9994 KByte/s Tx, 9468 KByte/s Rx.
I see no meaningful CPU load on neither DomU nor Dom0.
I'll check what's happening. Obviously it's not CPU load problem. It's possible that for example the request coalescing between Dom0 and DomU might not work properly with >1 vCPU.
Jaromir