Re: sftp rename not good.

To: Damien Miller <djm%mindrot.org@localhost>
Subject: Re: sftp rename not good.
From: Richard Silverman <slade%shore.net@localhost>
Date: Wed, 14 May 2003 03:30:58 -0400 (EDT)

I think this disagreement reflects a more basic difference of opinion on
the purpose of the sftp protocol, which needs to be resolved.  One side
views sftp as defining a fully abstract storage mechanism, having its own
internally consistent semantics which all compliant implementations must
uphold.  The constraint might be stated thus:

"Given a sequence S of sftp operations, and an sftp-observable initial
state I, the sftp-observable result state R of executing S from I must be
the same on any implementation."

In other words, one purpose of sftp is to hide the details of server host
operation in favor of a predictable storage abstraction.

The other side of the fence says no, no -- the purpose of sftp is to
provide convenient remote manipulation of a host's filesystem, in such a
way as to be as familiar as possible to users of the host OS.  Thus, each
implementation should be free to choose mappings of basic sftp operations
(such as SSH_FXP_RENAME) onto the server's filesystem primitives, in a way
its authors think will be most useful and familiar to users.  In
non-obvious cases, users will have no way of knowing what the mapping will
be, short of reading the software documentation (not the protocol spec),
or just trying it out.

Dan O'Reilly points out that this is what many of his users expect -- the
underlying assumption here is that the "user" is first and foremost a user
of the host OS, merely employing sftp as a way to get at some files when
not directly logged into the host.  While I agree this is a common
scenario, it is not the only one.  Equally valid is the following: the
user employs an sftp client as his *sole* method of accessing some file
store; he has never logged into the host, nor does he know or care what OS
it's running.  One day, the systems department replaces the server with a
new machine running a different OS, but also with an sftp server.  Our
putative user then performs some sequence of file manipulations he's done
many times before -- but gets different results!  He screams, because
wasn't using a consistent abstraction supposed to protect him from this
sort of thing?  Well, that's the question: is sftp supposed to afford such
protection, or not?

Despite the inherent elegance of a fully-abstract model, and its
advantages in some situations, I have to (reluctantly) say that model #2
is probably the way to go, for a number of reasons:

1) The sftp spec as it stands does not articulate or support viewpoint #1
   at all.  There is no requirement for fully abstract operation, and in
   fact, there is recognition of the opposite principle; from section 6.2
   (File Names):

   "... It is understood that the lack of well-defined semantics for file
    names may cause interoperability problems between clients and servers
    using radically different operating systems.  However, this approach
    is known to work acceptably with most systems, and alternative
    approaches that e.g.  treat file names as sequences of structured
    components are quite complicated."

2) The two models solve related but different problems.  One is remote
   access to different filesystem types via a usable
   least-common-denominator protocol, which will necessarily have some
   limitations.  The other is defining a consistent, server-independent
   remote filing protocol.  While the second problem is valid and could
   use a solution, I think in reality sftp is more geared toward solving
   the first.

3) The fully-abstract requirement will severely limit and complicate
   implementation.  For example, the Mac OS X HFS+ filesystem is
   case-preserving but not case-sensitive.  In a directory containing
   files "foo" and "bar", the result of renaming "foo" -> "Bar" using
   naive server-side semantics is going to be *very* different from what a
   user of a traditional Unix system would expect!  If abstract operation
   required case-sensitivity, how would you implement that on the server?
   And would it make any sense to someone logging into the server and
   viewing the result?

4) Given that people *will* be accessing files both via sftp and via the
   host OS, it gets worse -- even if the fully-abstract requirement stated
   earlier is met, there's no guarantee the user will be happy with the
   server-side result.  For example: NTFS allows multiple streams per
   file.  Sftp has no notion of that, and I imagine that all extant
   Windows sftp implementations simply read stream 0.  The sftp operation
   sequence "create bar; open foo; read foo; write bar; delete foo" will
   be sftp-observably identical to "rename foo -> bar" (atomicity and
   concurrency issues aside for the moment)... but again with naive
   server-side semantics, one will probably end up trashing a multi-stream
   file, while the other preserves it.

- Richard Silverman
  slade%shore.net@localhost

References:
- Re: sftp rename not good.
  - From: Dan O'Reilly
- Re: sftp rename not good.
  - From: Joseph Galbraith
- Re: sftp rename not good.
  - From: Alfred Perlstein
- Re: sftp rename not good.
  - From: Dan O'Reilly
- Re: sftp rename not good.
  - From: Alfred Perlstein
- Re: sftp rename not good.
  - From: Dan O'Reilly
- Re: sftp rename not good.
  - From: Damien Miller

Prev by Date: RE: sftp rename not good.
Next by Date: RE: sftp rename not good.
Previous by Thread: Re: sftp rename not good.
Next by Thread: Re: sftp rename not good.
Indexes:

Home | Main Index | Thread Index | Old Index