IETF-SSH archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: SFTP and unicode file names...



Jeffrey Hutzelman <jhutz%cmu.edu@localhost> writes:

> On Friday, October 08, 2004 13:28:45 +0200 Niels Möller
> <nisse%lysator.liu.se@localhost> wrote:
> 
> > (Note that utf-8 with undefined normalization is suboptimal for
> > filenames and for identifiers in general).
> 
> Yes.  I'd normally argue that in any case where UTF-8 will be used on
> the wire, it probably ought to be normalized.  However, the key issue
> here is not really normalization of names sent by the server but of
> those sent by the client, which refer to the names in the
> filesystem.

There's actually one important use case where normalization doesn't
matter. That is if the client first requests a directory listing, and
then presents those names to the user, via some graphical file dialog,
or via TAB-completion. In this case, the only important thing is that
the client doesn't modify the coding when a name is copied from the
directory listing to the open request.

> Unfortunately, there are filesystems that use unnormalized Unicode,

That's going to be painful. Examples?

> There is one thing I'd do differently.  Whenever possible, I'd prefer
> that the server not just advertise the character set but actually do
> the conversions. [...]

> Let the server advertise the character set it thinks is in use.
> Let the client decide whether it wants UTF-8 or raw bytes.

Good idea. I agree it makes sense to give clients that choice.

I'm not so sure I share your preference, though; I'm afraid subtle
problems may occur when file names are converted to utf-8, processed
by client ui code, and back.

One interesting case: When converting from the local charset to utf-8,
it is natural to require that the server uses normalized utf-8, right?
(With canonical normalization, not compatibility normalization).

However, if the server's local character set happens to be utf-8, and
some file names are unnormalized, then having the server normalize the
data for transmission is likely to break things.

Regards,
/Niels



Home | Main Index | Thread Index | Old Index