Subject: kern/30621: TCP hang on netbsd-2
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <Manuel.Bouyer@lip6.fr>
List: netbsd-bugs
Date: 06/28/2005 10:37:00
>Number: 30621
>Category: kern
>Synopsis: TCP hang on netbsd-2
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Jun 28 10:37:00 +0000 2005
>Originator: Manuel.Bouyer@lip6.fr
>Release: NetBSD 2.0_STABLE, cvs from Apr 10
>Organization:
LIP6/ASIM http://www-asim.lip6.fr/
>Environment:
System: NetBSD pop.lip6.fr 2.0_STABLE NetBSD 2.0_STABLE (GENERIC.MP) #0: Sun Apr 10 15:46:42 CEST 2005 root@pop.lip6.fr:/local/pop1/bouyer/tmp/i386/obj/local/pop1/bouyer/netbsd-2-0-clean/src/sys/arch/i386/compile/GENERIC.MP i386
Architecture: i386
Machine: i386
>Description:
TCP connections from a 2.0_RC2 or a 1.6.2 box to 2.0_STABLE sometimes
hang, with data pending in the send-queue on the 2.0_STABLE box.
Data in the other way (*to* the 2.0_STABLE system) are still
transmitted, as shown by tcpdump. I've seen this with ssh connections
to the 2.0_STABLE system with lots of stdout traffic, and X11
connections from a sofware running on the 2.0_STABLE system
displaying on the 1.6.2 box. I've never seen this when this box was
running a 2.0_BETA kernel.
Note that the 2.0_STABLE box is much faster than the 2.0_RC2 and 1.6.2
ones (dual-CPU PIII 1Ghz vs Ultra/1 143Mhz and alpha ev4 233Mhz).
Here's an example netstat output on the 2.0_STABLE system with
a connection in this state:
tcp 0 100 pop.65491 armandeche.X11 ESTABLISHED
The output of 'tcpdump host armandeche and port 6000' running
on the 2.0_STABLE system is available at
http://www-asim.lip6.fr/~bouyer/tcp_bug.txt.gz. It was started when
the X11 window was already open, and interrupted it once the
X11 window was wedged.
>How-To-Repeat:
ssh from a slow system to a fast one, and start something that will
print a lot of things on stdout, such as a build.sh -j4
Start a X11 software with large graphical windows and lots of redraw
(such as pkgsrc/cad/eagle) on a fast box and display on a slow one.
>Fix:
unknown.