IETF-SSH archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: SFTP and unicode file names...





On Friday, October 08, 2004 19:46:30 +0200 Niels Möller <nisse%lysator.liu.se@localhost> wrote:

Jeffrey Hutzelman <jhutz%cmu.edu@localhost> writes:

On Friday, October 08, 2004 13:28:45 +0200 Niels Möller
<nisse%lysator.liu.se@localhost> wrote:

> (Note that utf-8 with undefined normalization is suboptimal for
> filenames and for identifiers in general).

Yes.  I'd normally argue that in any case where UTF-8 will be used on
the wire, it probably ought to be normalized.  However, the key issue
here is not really normalization of names sent by the server but of
those sent by the client, which refer to the names in the
filesystem.

There's actually one important use case where normalization doesn't
matter. That is if the client first requests a directory listing, and
then presents those names to the user, via some graphical file dialog,
or via TAB-completion. In this case, the only important thing is that
the client doesn't modify the coding when a name is copied from the
directory listing to the open request.

Correct.


Unfortunately, there are filesystems that use unnormalized Unicode,

That's going to be painful. Examples?

I don't believe Windows normalizes anything.



Good idea. I agree it makes sense to give clients that choice.

I'm not so sure I share your preference, though; I'm afraid subtle
problems may occur when file names are converted to utf-8, processed
by client ui code, and back.

One interesting case: When converting from the local charset to utf-8,
it is natural to require that the server uses normalized utf-8, right?
(With canonical normalization, not compatibility normalization).

However, if the server's local character set happens to be utf-8, and
some file names are unnormalized, then having the server normalize the
data for transmission is likely to break things.

Yes. If the server's local character set is UTF-8, it SHOULD NOT normalize. It it's something else, then the server should convert to UTF-8 using whatever rules it deems appropriate, and should accept any reasonable representation from the client (particularly, it SHOULD NOT require normalized input).





Home | Main Index | Thread Index | Old Index