NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/55800: Data transfers stall when SACK is enabled
>Number: 55800
>Category: kern
>Synopsis: Data transfers stall when SACK is enabled
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Nov 10 08:40:00 +0000 2020
>Originator: kim%netbsd.org@localhost (Kimmo Suominen)
>Release: NetBSD 9.99.75 (202011081900Z)
>Organization:
>Environment:
System: NetBSD rendez-vous.gw.fi 9.99.75 NetBSD 9.99.75 (GENERIC) #0: Sun Nov 8 18:27:14 UTC 2020 mkrepro%mkrepro.NetBSD.org@localhost:/usr/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
Transferring files using rsync over ssh stalls after about 1 GB
of data transferred. (Might not be connected with the amount of
data, though.) The connection is over IPv4.
When the transfer stalls there is always some unresolved SACK.
During the transfer I observed regular bouts of SACK throughout
so not all occurrences of SACK result in a stall.
In the stalled state it looks like ssh is not getting any data
through (and therefore rsync is not receiving anything). I have
tcpdump output available here:
https://www.netbsd.org/~kim/NB-RSYNC-PROBLEM.txt
The last transfer stalled at 1:27. Then there are some packets
exchanged at 2:27 and 3:27. At 4:27 the connection is closed.
This would appear to match the sshd_config settings I have:
TCPKeepAlive no
ClientAliveInterval 3600
ClientAliveCountMax 3
The output on the terminal running rsync is as follows:
Timeout, server equinoxe not responding.
rsync: connection unexpectedly closed (949438584 bytes received so far) [receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(228) [receiver=3.2.3]
rsync: connection unexpectedly closed (14688411 bytes received so far) [generator]
rsync error: unexplained error (code 255) at io.c(228) [generator=3.2.3]
rsync: [generator] write error: Broken pipe (32)
I'm guessing the first line is from ssh, although I have not
verified that.
The remote side is running the NetBSD 9.1 release:
NetBSD 9.1 (GENERIC) #0: Sun Oct 18 19:24:30 UTC 2020
The local side is running the most recent -current snapshot:
NetBSD 9.99.75 (GENERIC) #0: Sun Nov 8 18:27:14 UTC 2020
When I first noticed the issue I was running a slightly older
-current (build ID derived from CVS checkout timestamp):
NetBSD 9.99.74 (GENERIC.202010172211Z~GW) #1: Sun Oct 18 02:20:50 EEST 2020
>How-To-Repeat:
This is the command I ran:
rsync -aHSs --delete --exclude /branch/ --exclude /daily/ \
--exclude /git/ --exclude /hg/ --exclude /releases/ \
--exclude /work/ --exclude /www/ equinoxe:/p/netbsd/ \
/p/netbsd/
Possibly any data transfer with enough data will do.
>Fix:
A successful workaround was to disable SACK on the local side:
sysctl -w net.inet.tcp.sack.enable=0
This transfer was using IPv4, but I did also disable IPv6 SACK:
sysctl -w net.inet6.tcp6.sack.enable=0
Home |
Main Index |
Thread Index |
Old Index