IETF-SSH archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: SFTP and unicode file names...
Jeffrey Hutzelman <jhutz%cmu.edu@localhost> writes:
> On Friday, October 08, 2004 13:28:45 +0200 Niels Möller
> <nisse%lysator.liu.se@localhost> wrote:
>
> > (Note that utf-8 with undefined normalization is suboptimal for
> > filenames and for identifiers in general).
>
> Yes. I'd normally argue that in any case where UTF-8 will be used on
> the wire, it probably ought to be normalized. However, the key issue
> here is not really normalization of names sent by the server but of
> those sent by the client, which refer to the names in the
> filesystem.
There's actually one important use case where normalization doesn't
matter. That is if the client first requests a directory listing, and
then presents those names to the user, via some graphical file dialog,
or via TAB-completion. In this case, the only important thing is that
the client doesn't modify the coding when a name is copied from the
directory listing to the open request.
> Unfortunately, there are filesystems that use unnormalized Unicode,
That's going to be painful. Examples?
> There is one thing I'd do differently. Whenever possible, I'd prefer
> that the server not just advertise the character set but actually do
> the conversions. [...]
> Let the server advertise the character set it thinks is in use.
> Let the client decide whether it wants UTF-8 or raw bytes.
Good idea. I agree it makes sense to give clients that choice.
I'm not so sure I share your preference, though; I'm afraid subtle
problems may occur when file names are converted to utf-8, processed
by client ui code, and back.
One interesting case: When converting from the local charset to utf-8,
it is natural to require that the server uses normalized utf-8, right?
(With canonical normalization, not compatibility normalization).
However, if the server's local character set happens to be utf-8, and
some file names are unnormalized, then having the server normalize the
data for transmission is likely to break things.
Regards,
/Niels
Home |
Main Index |
Thread Index |
Old Index