> Okay, I've been forced to face up to the unfortunate truth: > > Unix directories can contain files encoded in multiple > different char-sets and the server has no way to tell > what these multiple char-sets are and translate them > to UTF-8. Because the transformation is one way, once > the server has mistranslated the filename, there > is no way for the client to get back to the original > data. > > So, for these file-systems, the best possible thing > to do is send the filename raw and let the client > (with help from the user decode it.) > > On the other hand, maximum possible interoperability > between different language and regions is obtained > through use of UTF-8 where available. User can use cat/type/ls/dir <FILENAME> to access a local file and it should be able to do same with SFTP. As example "echo 'get <FILENAME>' | sftp localhost" should get same file. In all cases <FILENAME> should be same and is encoded in user charmap(charset/codeset/etc.). > > I haven't been able to come up with a solution I > really like. > > Here are some possibilities: > > 1. Let the server say what it is going to use, > UTF-8 or 'undefined-raw' at the beginning > of the sftp session. > [SNIP] I'm not sure that server is responsible to do decision for encoding: 'UTF-8' or 'RAW'. Since in a SFTP session client can request more that one file negotiation of file name encoding should be at begining of session. Client should request encoding from server. I guest that a new extension "encoding" will solve problem: 1.) Client send to server list of accepted encodings and server return prefered one or "RAW" or "UTF-8". To do this sftp implementation MUST implement extension "encoding". Extension should be defined in draft as "newline" is defined. I not sure that sftp can use names like "ascii", "usascii", "C", "POSIX", "ANSI_X3.4....", since ascii define only 7 bit charset. When SFTP server support 7-bit encoding is should(must?) reject file names containing symbols with code greater that 127. When encoding is not set server should treat filenames in "raw" or "utf-8" format. This must be annonced in "Server Initialization". Empty "encoding" is alias to "RAW". When encoding is set server must convert "local filename in encoding" <-> "wire filename in utf-8". Client may convert "wire filename in utf-8" <-> "local filename in encoding". Note that client know name conversion on the server. I guess that this solution is interoperable with SFTP clients version 1, 2 or 3. For version 4(four) clients, when server support encodung it should announce 3(three) as maximum version. For client version 1,2,3 server must use "RAW". Server version N(N>=5) must support "UTF-8", "RAW" and "ISO8859-1" encodings. Server version N(N>=5) may support "ISO8859-N" encodings, where N is in range 2-15. Client version N(N>=5) must support the "UTF-8" and "RAW" "encoding" extension. P.S.: As esample cyrillic use many encodings. Most popular are IS08859-5, KOI8-R, CP1251. In case of cyrillic one utf-8 file name in cyrillic can address different files on file system and this depend of encoding. In this case SFTP client with help from the user is responsible to select correct encoding. This is same case as access to local file system. I don't have problem to adress correct file name on remote host. My system is properly setup and I can use UTF-8, IS08859-5, KOI8-R and CP1251 in file/directory names and in GUI termininals. For the test in directory $HOME/tmp/cyr I have four files with name in format f.<ENCODING>.<NAME>, where <ENCODING> is one of mentioned above and <NAME> is first three leter from cyrillic alphabet in uppercase followed by same leters in lovercase in same encoding. Content of each file is "data.<ENCODING>:<NAME>" where <ENCODING> and <NAME> match the file name. In four xterm for every encoding I run same command sequence. Results are attached images in "ssh_session.UTF-8.png", "ssh_session.ISO8859-5.png", "ssh_session.KOI8-R.png" and "ssh_session.CP1251.png". With command "echo get tmp/cyr/f.*.<NAME> | sftp localhost", where sftp is openssh SFTP version 3 client, file that I get depend of my locale charmap. Regards, Roumen
Attachment:
ssh_session.UTF-8.png
Description: PNG image
Attachment:
ssh_session.ISO8859-5.png
Description: PNG image
Attachment:
ssh_session.KOI8-R.png
Description: PNG image
Attachment:
ssh_session.CP1251.png
Description: PNG image