IETF-SSH archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: SFTP and unicode file names...
> IMHO it is important that the server identify its character set up
> front, [...]
What about cases where there is no well-defined character set, such as
most Unix variants? (Suppose I have a file whose name consists of the
octets 0xc3 0xa1 0xd0 0xbf. Is that a file named LATIN SMALL LETTER A
WITH ACUTE - CYRILLIC SMALL LETTER PE (UTF-8), a file named LATIN
CAPTIAL LETTER A WITH TILDE - INVERTED EXCLAMATION MARK - LATIN CAPITAL
LETTER EDH - INVERTED QUESTION MARK (8859-1[%]), a file named GREEK
CAPITAL LETTER GAMMA - some character whose name I'm not sure of -
GREEK CAPITAL LETTER PI - GREEK CAPITAL LETTER OMEGA WITH TONOS
(8859-7[%]), a file whose name represents the integer 3173013909 rather
than any character sequence (base 254 with digits 0x01-0x2e and
0x30-0xff), or what? (Given how nonsensical the others are, the last
of those may actually be the most plausible.)
Nobody but the entities (software and/or humans) the file was created
for and used by can tell. Certainly ssh/sshd can't, not without being
told somehow.
Anything predicated on the assumption that every filename consists of
characters will run into this problem. I strongly believe there needs
to be some provision for filenames to be treated as opaque octet
sequences rather than insisting that they are character sequences.
/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML mouse%rodents.montreal.qc.ca@localhost
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
[%] Names taken from Unicode 0080-00FF and 0370-03FF; I don't have a
handy reference to the names 8859 uses.
Home |
Main Index |
Thread Index |
Old Index