Re: SFTP and unicode file names...

To: Niels Möller <nisse%lysator.liu.se@localhost>, Joseph Galbraith <galb-list%vandyke.com@localhost>
Subject: Re: SFTP and unicode file names...
From: Jeffrey Hutzelman <jhutz%cmu.edu@localhost>
Date: Mon, 11 Oct 2004 11:52:41 -0400

On Monday, October 11, 2004 16:54:06 +0200 Niels Möller<nisse%lysator.liu.se@localhost> wrote:

Joseph Galbraith <galb-list%vandyke.com@localhost> writes:

I would definitely prefer to see the server do the translation
when it can... that's why we went to UTF-8 in the first place.


I think there's one more use case that you need to consider, which I
expect is quite common:

The remote filesystem using the foo charset. The local system using
the same foo charset. Why do I think this is common? Because on both
sides, it's the same user's files, and the user is likely to use his
or her favourite charset (iso-8859-1, utf-8, euc-jis, whatever) on
most or all systems where he or she has an account.

What you call "raw mode" will work fine in this case, no matter if the
sftp implementation on server or client side knows about the foo
charset.

I like Jeffrey Hutzelman's proposal: Have two modes of operation, and
let the client select which mode it prefers,

 1. Server tells client the server's best guess as to what character
    set is used for filenames, and doesn't convert filenames in any
    way.

 2. All filenames on the wire are utf-8. Server converts filenames to
    and from utf-8 on a best effort basis, according to it's best
    guess of the actual charset. (What's the right thing to do if/when
    conversion fails, I don't know yet).

IMHO it is important that the server identify its character set up front,so that the client is able to use that information in deciding which modeto use. Also, the server needs to be able to indicate that it is incapableof performing Unicode conversion. I'd like to be able to say that theserver MUST be capable of performing the conversion, but I don't thinkthat's realistic.

If the server is capable of doing conversion, then conversion from thelocal character set to unicode should not be able to fail. If it does,something is very wrong.

I can think of only two cases in which the conversion from UTF-8 to thelocal character set can fail. The first is when the input is somehowinvalid (bad UTF-8, contains illegal Unicode code points, etc). We couldhandle this in a number of ways, up to and including terminating theconnection. :-)

The second case is when the input is valid, but contains characters notpresent in the local character set. The server's mapping should be goodenough to handle cases where the same local character can be represented inmultiple ways in unicode (for example, there are at least two ways to writeÅ in unicode, but only one way in iso-8859-1). For characters which aregenuinely not in the local character set, the server can return anappropriate error depending on context. For example, trying to open a filewhose name contains an untranslateable character will always fail withsomething ENOENT-like. Trying to create such a file should fail with anillegal filename.




-- Jeff

Follow-Ups:
- Re: SFTP and unicode file names...
  - From: Niels Möller
- Re: SFTP and unicode file names...
  - From: der Mouse

References:
- SFTP and unicode file names...
  - From: Joseph Galbraith
- Re: SFTP and unicode file names...
  - From: Niels Möller
- Re: SFTP and unicode file names...
  - From: Jeffrey Hutzelman
- Re: SFTP and unicode file names...
  - From: Joseph Galbraith
- Re: SFTP and unicode file names...
  - From: Niels Möller

Prev by Date: Re: SFTP and unicode file names...
Next by Date: Re: SFTP and unicode file names...
Previous by Thread: Re: SFTP and unicode file names...
Next by Thread: Re: SFTP and unicode file names...
Indexes:

Home | Main Index | Thread Index | Old Index