Re: filexfer (was: Secure Shell WG: what's left?)

To: ietf-ssh%NetBSD.org@localhost
Subject: Re: filexfer (was: Secure Shell WG: what's left?)
From: der Mouse <mouse%Rodents.Montreal.QC.CA@localhost>
Date: Wed, 10 Aug 2005 02:11:45 -0400 (EDT)

>>>> 	the complexity of this draft seems to be increasing,
>>>> 	due largely to differences in philosophy between posix/unix and
>>>> 	not-posix/unix filesystems.
>>> I prefer to think of this as adding functionality needed for
>>> interoperability that was ignored in earlier versions of this
>>> draft.
>> ...but in the process making it impossible for a moderately large
>> class of systems to support more than ASCII under the protocol as
>> specified at all.  (Strictly, they can't support even that much, but
>> can probably get away with pretending.)
> Are we talking about unicode filenames?

Well, I was, at least.

> I really wish I understood your view point on this, but I don't.

> What is wrong with:

> 1. If the client does not turn off filename translation,
>     the server should either:

>     a. Use the true character set of the file as recorded by
>        filesystem, if such exists.

Of course.  The "moderately large class of systems" I refer to is those
for which file names as recorded by the filesystem are octet sequences
rather than character sequences, and thus this condition fails.

>     b. Pretend the user is sitting at a terminal.  As such,
>        the terminal has an encoding it uses to display text.

Yes, but the server has no way to tell what it is.

The window I'm typing this mail into happens to be using a font which
is basically ISO 8859-1 (it's 8859-1 plus some glyphs in positions
where 8859-1 does not have printable characters).  If I were to tell it
to switch to, say, an 8859-7 font, a hypothetical sftp running in that
window would have no way to even realize any change occurred, much less
get enough details to do anything useful with it.  And a server process
would be even more disconnected from that encoding change.

You could argue that this is a bug in the OS design, failing to treat
data as characters (with character-set information attached) rather
than uninterpreted blobs of bits.  But even if I were to agree with
you, such systems still exist, and I think that rendering filexfer-*
unimplementable on them would be a critical problem (especially as an
implementer who works primarily on one of them).

>     c. If the OS provides no mechanism for determining the user
>        preferred encoding, but it is still possible for user to use
>        different encodings for filenames,

Exactly the situation that concerns me: an OS which provides no
mechanism for determining from a file name what character set was
intended by that name's creator - or indeed whether any was - but which
happily lets users use whatever encoding their display/input
hardware/software happens to use.

>        the server implementation itself may have to provide a way for
>        the user to configure their preferred encoding.

Perhaps.  It's not a wholly unreasonable thing to provide.  It doesn't
answer the question of what to do if none has been configured, though.
(Requiring that one be configured strikes me as excessive, especially
given the lack of any standardized character set, as far as I know,
which assigns characters to all 256 possible octet values.)

>     d. If said translation fails, the server should set
>        SSH_FILEXFER_ATTR_FLAGS_TRANSLATION_ERR, and place
>        the untranslated name in the attrib untranslated-name
>        field.

This actually does provide a mostly-reasonable tack to take: always do
that.  (Possibly except when the name is pure ASCII, though assuming
even that much can be dangerous; consider a system some users of which
like EBCDIC...or, less far-fetched, KOI-7.)  After all, a lack of
character set information (which is the fundamental problem here) _can_
be looked on as an error arising when attempting to translate an octet
string to Unicode.

I don't think ATTR_FLAGS_TRANSLATION_ERR and untranslated-name were in
the previous version; I'm glad to see them.  Even when there *is*
character set information available, there needs to be something to do
when, for example, an octet is encoutered which has no corresponding
character in the set the string is marked as using.

I do note that filexfer-09 provides no guidance on what to put in the
UTF-8-encoded name field when a translation error occurs (a
recommendation for "zero-length string" or "best effort" or some such
would probably be a Good Thing).

> 2. If the client does turn off filename translation, the server
>    simply sends rthe filename data as it reads it fom disk.

Of course.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse%rodents.montreal.qc.ca@localhost
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

References:
- RE: Secure Shell WG: what's left?
  - From: Richard Whalen
- Re: Secure Shell WG: what's left?
  - From: der Mouse
- Re: filexfer (was: Secure Shell WG: what's left?)
  - From: Joseph Galbraith

Prev by Date: Re: publickey subsystem (was: Secure Shell WG: what's left?)
Next by Date: Re: Secure Shell WG: what's left?
Previous by Thread: Re: filexfer (was: Secure Shell WG: what's left?)
Next by Thread: AD Review comments on draft-ietf-secsh-gsskeyex
Indexes:

Home | Main Index | Thread Index | Old Index