Subject: Re: FIN_WAIT_2's remaining in connection list
To: None <tech-net@netbsd.org>
From: Ryan Younce <ryan@manunkind.org>
List: tech-net
Date: 10/22/2000 21:27:31
Thus spake Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us>:
> > I was perusing the FreeBSD bugs report page and discovered a bug marked
> > serious in which connections to a remote Netware web server (and then
> > closing the connection) would cause the connection to remain in the
> > FIN_WAIT_2 state for several days. The submitter indicated that he had
> > run a script to open several thousand connections, and they all remained
> > in the list for several days before being cleared out.
>
> Be careful. FIN_WAIT_2 is a (potentially) long-term stable state in
> TCP (when one direction has closed and the other hasn't); simply
> deleting connections which have been in FIN_WAIT_2 state for 2*MSL may
> cause data loss, because the connection is still actually open in the
> inbound direction at that point!
I have the feeling a "more" correct approach to this would require
something more than the provided patch. What differentiates FIN_WAIT_2
states like these from those that were brought to light back in (96?) by
apache? (and I think it was then that FreeBSD 2.x specifically added
FIN_WAIT_2 timeouts, whose value I know the TCP standardization does not
specifically state).
Please somebody correct me if I'm wrong, as my TCP state transition knowledge
is a bit flaky I'm sure, but here's how I perceive a client-side close state
transition:
The local-end closes its connection end, sending a segment
containing a FIN to the remote-end. The state is now FIN_WAIT_1.
If the remote-end sends only an ACK back, the local-end begins
waiting for the remote-end to close its end of the connection, which
will send us a FIN. The state is now FIN_WAIT_2.
Only when the remote-end has sent this FIN (unless we manually
intervene like with the patch) will the local-end (responding with
an ACK as a result) alter the connection to TIME_WAIT state.
The BSD servers correctly went through the entire sequence when I tested
the method, correctly arriving at TIME_WAIT. The server listed in the PR
keeps the connection FIN_WAIT_2 for an extraordinarily long time, so I
assume the Netware machine listed never sends a final FIN back to the
local-end.
From the best I can tell, the timer is set for twice the maximum segment
lifetime in /sys/netinet/tcp_usrreq.c. I don't know for certain how long
this is, but I believe it is 2 minutes (from what I can tell from the
kernel source).
I think my biggest question is: is this a problem with *BSD/Linux, or is
this just a caveat of TCP? It just seems like too unfortunate a consequence
for a BSD to Netware connection being closed.
--
Ryan Younce |"A language that doesn't have everything is actually
ryan@manunkind.org | easier to program in than some that do."
www.manunkind.org/~ryan | -- Dennis M. Ritchie