Why SFTP performance sucks, and how to fix it

To: ietf-ssh%netbsd.org@localhost
Subject: Why SFTP performance sucks, and how to fix it
From: pgut001%cs.auckland.ac.nz@localhost (Peter Gutmann)
Date: Tue, 8 Jul 2003 16:01:02 +1200

Now that I've got your attention... :-).

The following is a section of the (not-yet-published) paper "Performance
Characteristics of Application-level Security Protocols", which looks at,
well, performance characteristics of application-level security protocols.
One of these is SSH (the rest are SSL, PGP, and S/MIME, in case anyone cares).
The paper hasn't been published yet (it's still a work in progress), since it
probably won't be published for awhile (I'm probably submitting it to Usenix
Security next year) and since the information in this section could be useful
to SSH developers, I'm posting it here.  If people find it of any use, I might
later post it to one or two general crypto lists to let other crypto
developers know, since it describes the SFTP performance problem and how to
fix it.

-- Snip --

6.3 The SSHv2 and SFTP Performance Handbrake

In 1977, Ward Christensen created the Xmodem data transfer protocol
[Christensen 1977].  Coming in an era of 300bps modems and unreliable links,
Xmodem divided data into 128-byte packets and required an Ack to be sent for
each packet before the next one could be transmitted.  As modems became faster
and links more reliable, the need to Ack each Xmodem packet became more and
more of a performance handbrake, since no matter how fast or reliable the
link, no more than 128 bytes of data could be sent without waiting 1 RTT for
the Ack.  The solution to the problem was to increase the packet size (Ymodem,
Xmodem-1K), and drop the requirement to Ack each packet (Ymodem-g, Zmodem)
[Forsberg 1988].  The latter was perfectly acceptable, since by then modems
included their own error correction and flow control mechanism.

Unfortunately this performance handbrake was reinvented in the SSHv2 protocol.
Like Ymodem-g and Zmodem running over modern modems, TCP/IP provides a
reliable, flow-controlled transport layer for the SSH protocol.  SSHv2 however
introduced an additional form of flow control that, like Xmodem, requires the
receiver to Ack each packet before more can be sent (the details aren't quite
as straightforward as this since the SSHv2 specification describes things in
terms of packets and data windows, but effectively it's the Xmodem per-packet
Ack).  Most implementations seem to use packet sizes of 16K or occasionally
32K, with some going as low as 4K.  What this means is that no matter how fast
the link, every (say) 16K the transmission stops for 1 RTT until the other
side has sent its Ack (referred to as a window adjust in SSHv2 terminology).
Consider for example the effect of this on a T1 international link with a
half-second RTT.  With the handbrake in operation, the link can run at only
17% of its total capacity.  This performance hit is so noticeable that it is
mentioned in the FAQs of some SSH implementations [PuttyFAQ].

In addition to the protocol-level handbrake, the SFTP protocol that runs on
top of SSH contains its own handbrake.  This protocol recommends that reads
and writes consist of no more than 32K of data, even though it's running over
the reliable SSH transport which is in turn running over the reliable TCP/IP
transport.  One common implementation limits SFTP packets to 4K bytes,
resulting in a mere 4% link utilisation in the previously-presented scenario.

The fix for this problem is obvious: Remove the handbrake.  This is no good
reason for the per-packet Ack, and certainly other protocols such as SSHv1 and
SSL/TLS function perfectly without it (the absence of the handbrake in SSHv1
is why SSH FAQs observe that the SSHv1 scp is so much faster than the SSHv2
SFTP, even though SFTP is overall a better design).  The effect of running
without the handbrake on were investigated using cryptlib with a fairly
rudimentary implementation of SFTP running over the built-in SSHv2.
cryptlib's SSH implementation has always set the window size to INT_MAX (some
implementations have problems with UINT_MAX as the window size), which
effectively disables the SSH-level handbrake.  The SFTP implementation
followed suit, requesting a read/write of the entire file at once rather than
breaking it up into little packets at the SFTP level (packetisation is already
handled at the SSH and TCP/IP layers).  Run over an international link (pretty
much a given when you're in New Zealand), this SFTP implementation was around
five times faster than the Putty implementation of SFTP talking to OpenSSH,
which sends data in 4K SFTP packets and (by extension) 4K SSH packets.  Even
over a low-latency link, the difference was impressive: cryptlib was an order
of magnitude faster than Putty on the loopback interface (latency being
relative in this case).

The SSH-level handbrake can therefore be provisionally removed by having
implementations set the window size to INT_MAX, and permanently removed by
deprecating the Ack/window-based flow control and perhaps optionally providing
Xon/Xoff-style flow control if absolutely necessary (as was mentioned earlier,
both SSHv1 and SSL/TLS function fine without requiring this).  The SFTP-level
handbrake can be removed by eliminating the maximum packet-size wording of the
SFTP specification, and recommending that implementations read and write all
data at once rather than engaging in additional redundant packetisation at the
SFTP level.

Most of this can be effected through a simple code change, however
implementors should be aware that many implementations will still stop and
wait for an Ack after a certain amount of data has been transmitted, even with
an effectively infinite-sized windows.  On the sender side things aren't quite
so bad, experimentation has shown that it's fairly safe to ignore the
receiver's window size and send data at the maximum rate possible, discarding
any window adjusts that arrive (the only slight complication is that it's
occasionally necessary to stop sending for a moment and clear the read channel
of the accumulation of Acks that have arrived while sending).  Since most
implementations include a facility for checking the peer's software version to
identify and work around implementation bugs, detecting pre-handbrake-fix
implementations and providing the appropriate slower interpretation of the
protocol should be relatively straightforward.  In addition, FAQs about the
poor performance of SFTP will need to be updated.

[Christensen 1977] "MODEM.ASM", Ward Christensen, August 1977 (the Xmodem
protocol was defined in terms of "What this program does" rather than being
formally documented, the author described it in a Compuserve post some years
later as "a quick hack I threw together").

[Forsberg 1988] "Xmodem/Ymodem Protocol Reference: A compendium of documents
describing the Xmodem and Ymodem File Transfer Protocols", Chuck Forsberg,
October 1988.

[PuttyFAQ] "PuTTY FAQ", Simon Tatham, 2003,
http://www.chiark.greenend.org.uk/~sgtatham/putty/faq.html, question A.6.8,
"PSFTP transfers files much slower than PSCP".

-- Snip --

While I'm pointing out things that should be fixed in the spec, my other big
gripe is the way the initial message is handled.  Currently the spec describes
a rather messy mechanism where both sides start by shouting at each other and
then engage in a complex dance to sort out what's what afterwards (the
"guessing" stuff).  This leads to really messy implementations when one of the
partners doesn't get the dance steps right.

There is no good reason for this complication in the protocol.  I don't buy
the RTT argument given in the SSH-transport draft, the guessing stuff saves
one whole RTT, but then the incredibly chatty authentication protocol ("Would
you like to authenticate then?" - "Yes I'd like to authenticate" - "How would
you like to authenticate?" - "Well, would the following suit you?" - "That
looks about right, let's do it" - "Right, I'm about to start" - etc etc etc)
more than makes up for any miniscule savings during the initial handshake.

The way to fix this is simple: Replace all the guessing stuff and the complex
rules that go with it with:

  Key exchange begins by each side sending lists of supported algorithms.  The
  server sends its list of supported algorithms first, the client chooses
  which ones it prefers that it also supports and sends back its choice in the
  reply.

That removes all of the handshake-dance complexity, and vastly simplifies
implementations.

Oh yes, in case anyone finds the SFTP info above useful and wants to reference
it for some reason, please cite it as '"Performance Characteristics of
Application-level Security Protocols", Peter Gutmann, to appear', since it's
not officially published yet.

Peter.

Follow-Ups:
- Re: Why SFTP performance sucks, and how to fix it
  - From: Joseph Galbraith
- Re: Why SFTP performance sucks, and how to fix it
  - From: Markus Friedl
- Re: Why SFTP performance sucks, and how to fix it
  - From: Nicolas Williams
- Re: Why SFTP performance sucks, and how to fix it
  - From: Markus Friedl
- RE: Why SFTP performance sucks, and how to fix it
  - From: denis bider
- Re: Why SFTP performance sucks, and how to fix it
  - From: Markus Friedl
- Re: Why SFTP performance sucks, and how to fix it
  - From: Simon Tatham
- Re: Why SFTP performance sucks, and how to fix it
  - From: Markus Friedl
- Re: Why SFTP performance sucks, and how to fix it
  - From: Martin Pool

Prev by Date: Comment on draft-ietf-secsh-gsskeyex-06
Next by Date: Re: Why SFTP performance sucks, and how to fix it
Previous by Thread: Comment on draft-ietf-secsh-gsskeyex-06
Next by Thread: Re: Why SFTP performance sucks, and how to fix it
Indexes:

Home | Main Index | Thread Index | Old Index