IETF-SSH archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Protocol ambiguity: want_reply vs CHANNEL_CLOSE
I really still think this is a simple issue for the request sender to solve.
After CLOSE is received, instead of keeping the channel alive indefinitely,
keep it in a zombie state for up to X minutes. Don't make a channel in this
zombie state keep a session open, if it would otherwise be closed. If you
receive any outstanding responses during zombie state, this lets you
recognize them and ignore them and not mix them up with any other channels
that might have been opened in the meanwhile.
You don't need to know what OpenSSH implementors think because this strategy
works regardless of their opinion. You don't need a spec or a community
consensus to back this strategy because you don't have to defend this
strategy against anyone; it works with all request responders.
It simply makes sense to implement.
-----Original Message-----
From: Simon Tatham
Sent: Monday, April 7, 2014 07:45
To: Niels "Möller"
Cc: ietf-ssh%netbsd.org@localhost
Subject: Re: Protocol ambiguity: want_reply vs CHANNEL_CLOSE
nisse%lysator.liu.se@localhost (Niels =?iso-8859-1?Q?M=F6ller?=) wrote:
Can you give us some more details on the problems you have encountered?
Background: as I mentioned in a previous message, PuTTY periodically
sends a bogus channel request "winadj%putty.projects.tartarus.org@localhost",
with want_reply set. There is no correct handling defined for this
request, so we expect CHANNEL_FAILURE from the server (though we know
of at least one server which amusingly sends CHANNEL_SUCCESS :-). We
use the timing of that failure message to gauge the round-trip time
and tune our window size. All problems I've seen with this race
condition have related to these winadj messages. (So the usual
workaround if a server is not playing nicely with PuTTY is to suppress
winadjes.)
The first problem we noticed was precisely your #1: PuTTY would crash
out with 'Received SSH2_MSG_CHANNEL_FAILURE for nonexistent channel
256' type messages, which turned out to be because the server was
sending us CHANNEL_FAILURE after CHANNEL_CLOSE whereas we were
expecting that CHANNEL_CLOSE meant the server would send no further
messages about that channel.
We observed this with several servers: my email archives mention
Dropbear 0.51 in 2008, OpenSSH 5.1p1 in 2010, and at least one or two
SSH servers in embedded devices for which I didn't get full details.
Dropbear promptly fixed the problem (changing the server to stop
sending post-close CHANNEL_FAILURE). OpenSSH did not, because Damien
Miller questioned my analysis of it as a server-side bug on the basis
that the spec was unclear (which indeed it is):
https://bugzilla.mindrot.org/show_bug.cgi?id=1818
Since OpenSSH didn't make any changes, and since I'd forgotten the
previous Dropbear incident at that time, I implemented PuTTY's current
workaround, which is to defer _sending_ CHANNEL_CLOSE until it's seen
all outstanding CHANNEL_FAILURE messages, so that (from our POV) the
channel is still not completely closed at the point when those
failures arrive. Unfortunately, it apparently didn't occur to me that
that would break interoperation with servers that _don't_ send the
outstanding failure messages, because now PuTTY sits and waits for a
failure message that will never arrive and so users see hangs on the
client side.
So, clearly some kind of workaround is needed. But separately from the
question of stopgap workarounds while we all sort ourselves out, I'd
like to establish a consensus on what the right behaviour _is_, so
that SSH implementations can gradually converge on that, and so that
in any further situation where a client and a server don't work well
together, there's a document to point to which will make it clear
which one is wrong, so that we don't have 'no, _you_ fix it' deadlocks
between equally stubborn implementors.
Responses so far have all said that the consensus is to rule that
sending FAILURE after CLOSE is wrong, and that agrees with my original
belief (before I started finding out that not everyone agreed). The
only people I know who have so far _deliberately_ gone with the other
behaviour (in the sense of arguing against fixing it when it was
pointed out) are the OpenSSH maintainers. Are any of them reading
this?
Cheers,
Simon
--
Simon Tatham "I thought I'd put my foot so far into my mouth I
<anakin%pobox.com@localhost> wouldn't be able to sit down without standing up."
Home |
Main Index |
Thread Index |
Old Index