Subject: kern/1659: TCP connections may be left hanging in FIN_WAIT_2
To: None <gnats-bugs@gnats.netbsd.org>
From: Arne Henrik Juul <arnej@imf.unit.no>
List: netbsd-bugs
Date: 10/20/1995 17:44:35
>Number: 1659
>Category: kern
>Synopsis: TCP connections may be left hanging in FIN_WAIT_2
>Confidential: no
>Severity: non-critical
>Priority: low
>Responsible: kern-bug-people (Kernel Bug People)
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Oct 20 11:50:01 1995
>Last-Modified:
>Originator: Arne H. Juul
>Organization:
Norwegian University of Technology and Science
>Release: NetBSD-current, last updated 18 oct 1995
>Environment:
Originally seen on NetBSD/i386 with 1.0. Existing
in all NetBSD variants, and probably other BSD-derived systems as well.
>Description:
Recently we've been setting up a NetBSD machine with various network
services. After some time, the machine got many connections (> 2000)
hanging in FIN_WAIT_2, with no outstanding data or controlling process.
There is a timer to prevent this situation, since there's not much
point in keeping the socket around (no data can be sent either way).
For details on this, see the Wrigth & Stevens book (TCP/IP Illustrated,
volume 2).
However, this timer is only started if the socket went into FIN_WAIT_2
with *both* data directions closed down. The application in question
(a small web-service) would first shutdown it's own writing direction,
wait for the other end to close the other direction. Only when
this happened (or alternately after an application-internal timeout)
the application would close the socket.
This means that the socket was only half-closed when it went into
FIN_WAIT_2, with possibility for data to arrive at the socket still.
So the FIN_WAIT_2 timer wasn't started at this time.
However, the timer wasn't started later when the application closed
the socket either. This seems like an omission: The system arrives
in the same state in all respects, except that the timer isn't
started. Therefore, the FIN_WAIT_2 timer should be started when
the application closes a socket that already is in the FIN_WAIT_2
state.
In our case, we could also `fix' the application to just close the
socket instead of half-close it, but still this is a loophole
in the FIN_WAIT_2 rules that shouldn't have been there.
>How-To-Repeat:
Well, you need to run a server as described above, with lots
of badly-behaving clients to really see this happening.
>Fix:
This patch (to -current) starts the timer when
the user does close() if the socket is already in FIN_WAIT_2.
According to Stevens this is OK.
*** tcp_usrreq.c.orig Fri Oct 20 17:14:14 1995
--- tcp_usrreq.c Fri Oct 20 17:14:59 1995
*************** tcp_usrclosed(tp)
*** 521,530 ****
--- 521,532 ----
tp->t_state = TCPS_LAST_ACK;
break;
}
if (tp && tp->t_state >= TCPS_FIN_WAIT_2)
soisdisconnected(tp->t_inpcb->inp_socket);
+ if (tp && tp->t_state == TCPS_FIN_WAIT_2)
+ tp->t_timer[TCPT_2MSL] = tcp_maxidle;
return (tp);
}
/*
* Sysctl for tcp variables.
>Audit-Trail:
>Unformatted: