IETF-SSH archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: SFTP and unicode file names...



> IMHO it is important that the server identify its character set up
> front, [...]

What about cases where there is no well-defined character set, such as
most Unix variants?  (Suppose I have a file whose name consists of the
octets 0xc3 0xa1 0xd0 0xbf.  Is that a file named LATIN SMALL LETTER A
WITH ACUTE - CYRILLIC SMALL LETTER PE (UTF-8), a file named LATIN
CAPTIAL LETTER A WITH TILDE - INVERTED EXCLAMATION MARK - LATIN CAPITAL
LETTER EDH - INVERTED QUESTION MARK (8859-1[%]), a file named GREEK
CAPITAL LETTER GAMMA - some character whose name I'm not sure of -
GREEK CAPITAL LETTER PI - GREEK CAPITAL LETTER OMEGA WITH TONOS
(8859-7[%]), a file whose name represents the integer 3173013909 rather
than any character sequence (base 254 with digits 0x01-0x2e and
0x30-0xff), or what?  (Given how nonsensical the others are, the last
of those may actually be the most plausible.)

Nobody but the entities (software and/or humans) the file was created
for and used by can tell.  Certainly ssh/sshd can't, not without being
told somehow.

Anything predicated on the assumption that every filename consists of
characters will run into this problem.  I strongly believe there needs
to be some provision for filenames to be treated as opaque octet
sequences rather than insisting that they are character sequences.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse%rodents.montreal.qc.ca@localhost
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

[%] Names taken from Unicode 0080-00FF and 0370-03FF; I don't have a
    handy reference to the names 8859 uses.



Home | Main Index | Thread Index | Old Index