NetBSD-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Weird network performance problem
Thanks for the good suggestions. I'll go ahead with some tcpdumps.
On Sun, 19 Jan 2020 at 15:49, Greg Troxel <gdt%lexort.com@localhost> wrote:
>
> [lots of details]
>
> These things are somewhat tricky to debug. There can be issues in the
> TCP stacks, issues with interfaces, and issues within the network. I
> have a suspicion that there is something not 100% right about NetBSD's
> TCP retransmit behavior under fairly rare loss conditions, and you may
> be seeing that. If you can reproduce this reliably we could perhaps
> figure it out.
>
> My advice is:
>
> First figure out what's going on with the ethernet-over-powerline
> taken out of the equation.
I tried already to eliminate it. One of the laptops - marked B,
usually running Fedora but with W10 as well, is connected directly to
the same gigabit switch; when it is running Fedora, the iperf3 results
to the NetBSD machine are as expected; when it is running W10, they
are about three times slower. It is in the second part of the message.
So there is something Windows specific, for the moment I will discount
the powerline adapters alltogether.
>
> It looks like you are using vlan support on Y. Try without also.
That may be something to look at. This is my NVMM host as well, every
boot I recreate tap[0..5] for use by the NVMM guests (but the tests
were done without any of them running).
I am not using vlans deliberately - the switch upstairs is a dumb one,
although the one downstaris is managed and has (unusued at the moment)
vlan support. The interfaces are created simply with /etc/ifconfig.wm0
- just 'inet 192.168.0.29 netmask 255.255.255.0 up description "My
LAN"' and /etc/ifconfig.bridge0 -
create
!ifconfig tap0 create up description "LxMint"
!ifconfig tap1 create up description "MXLinux"
!ifconfig tap2 create up description "FreeBSD12"
!ifconfig tap3 create up description "NBSDc"
!ifconfig tap4 create up description "OpenBSD"
!ifconfig tap5 create up description "Windows10"
!brconfig $int add wm0
!brconfig $int add tap0
!brconfig $int add tap1
!brconfig $int add tap2
!brconfig $int add tap3
!brconfig $int add tap4
!brconfig $int add tap5
up
so whatever is the default in these conditions is used.
>
> Do some iperf3 testing with UDP. This should more or less separate
> loss from TCP's behavior in response to loss. I am unclear on how
> iperf3 deals with this, but it seems obvious that it can tell you what
> fraction of the UDP packets it sent ended up arriving.
>
> [not easy but worth it] install graphics/xplot-devel. Read the info
> about tcp plots. Capture the data with tcpdump at the NetBSD server
> end (with -w to a file). More generally, capture data at the host
> that is slow in transmitting; this gets that host's view of the
> arriving acks. Process the tcpdump output with tcpdump2xplot,
> probably having to debug and fix the perl script to account for drift
> in tcpdump format over time. Or perhaps use a netbsd-5 tcpdump to
> decode. Then, learn how to read the plots, and look at the data.
> This will let you see what packet loss there is, and how the TCP
> sender responds to it.
>
> I can help you offlist with the xplot stuff, as I already understand
> this (my grad school officemate's thesis project). It's on my todo list
> to update the parsing code to cope with more modern tcpdump, which I
> hope will stop rototilling the formats.
>
> One thing you said seemed odd:
>
> I test the network speed using iperf3 on all these boxes. The speeds
> upstairs, where all the machines are connected to the gigabit switch,
> are roughly consistent - I get some 930Mbps both ways (there is a bit
> of a speed ramp-up when the server is the NetBSD laptop, but after the
> fifth or so transfer it gets to the same rates). The speeds are also
>
> Can you explain this more precisely, and maybe post a few summary lines?
> This doesn't really make sense to me. Any given TCP connection has to
> ramp up the congestion window, but I would't expect a second one 30s
> later to benefit from the first -- but maybe there is some caching of
> RTT or something else? After the speeds improve, how long can you wait
> before another test that is back to slower? Going way out on a limb,
> this smells like caching of some parameters that leads to better
> handling packet loss, and the real issue is that the loss shouldn't be
> happening.
From the XCP-NG host to the NetBSD laptop:
$ iperf3 -c ymir.lorien.lan
Connecting to host ymir.lorien.lan, port 5201
[ 4] local 192.168.0.5 port 36036 connected to 192.168.0.29 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 45.9 MBytes 385 Mbits/sec 0 66.5 KBytes
[ 4] 1.00-2.00 sec 64.2 MBytes 539 Mbits/sec 0 100 KBytes
[ 4] 2.00-3.00 sec 81.3 MBytes 682 Mbits/sec 0 132 KBytes
[ 4] 3.00-4.00 sec 99.4 MBytes 834 Mbits/sec 0 163 KBytes
[ 4] 4.00-5.00 sec 109 MBytes 911 Mbits/sec 0 205 KBytes
[ 4] 5.00-6.00 sec 111 MBytes 928 Mbits/sec 0 205 KBytes
[ 4] 6.00-7.00 sec 111 MBytes 928 Mbits/sec 0 205 KBytes
[ 4] 7.00-8.00 sec 111 MBytes 932 Mbits/sec 0 205 KBytes
[ 4] 8.00-9.00 sec 111 MBytes 930 Mbits/sec 0 205 KBytes
[ 4] 9.00-10.00 sec 111 MBytes 932 Mbits/sec 0 205 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 954 MBytes 800 Mbits/sec 0 sender
[ 4] 0.00-10.00 sec 953 MBytes 800 Mbits/sec receiver
Starts a bit slower, but after the fourth interval reaches along the maximum.
When the server is the B laptop running W10, I get:
$ iperf3 -c brutus.lorien.lan
Connecting to host brutus.lorien.lan, port 5201
[ 4] local 192.168.0.5 port 43654 connected to 192.168.0.36 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 106 MBytes 885 Mbits/sec 0 220 KBytes
[ 4] 1.00-2.00 sec 108 MBytes 902 Mbits/sec 0 220 KBytes
[ 4] 2.00-3.00 sec 112 MBytes 938 Mbits/sec 0 220 KBytes
[ 4] 3.00-4.00 sec 111 MBytes 934 Mbits/sec 0 220 KBytes
[ 4] 4.00-5.00 sec 112 MBytes 935 Mbits/sec 0 220 KBytes
[ 4] 5.00-6.00 sec 112 MBytes 941 Mbits/sec 0 220 KBytes
[ 4] 6.00-7.00 sec 112 MBytes 941 Mbits/sec 0 220 KBytes
[ 4] 7.00-8.00 sec 109 MBytes 917 Mbits/sec 0 220 KBytes
[ 4] 8.00-9.00 sec 112 MBytes 943 Mbits/sec 0 220 KBytes
[ 4] 9.00-10.00 sec 112 MBytes 942 Mbits/sec 0 220 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 1.08 GBytes 928 Mbits/sec 0 sender
[ 4] 0.00-10.00 sec 1.08 GBytes 928 Mbits/sec receiver
- e.g. from the start the speed is close to the max.
The lack of symetry is strange - from NetBSD to W10 - full speed; from
W10 to NetBSD - about a third... At the same time there is no
significant difference if instead of W10 you put Linux or FreeBSD -
both ways it is similar. And it can't be thrown at iperf3 on W10 only
- when the server is Linux or FreeBSD, the speed is as expected.
--
----
Home |
Main Index |
Thread Index |
Old Index