IETF-SSH archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Internationaliztion: UTF-8 for file names



"Joseph Galbraith" <galb-list%vandyke.com@localhost> writes:

> I've now read the NFS specification on the matter
> [RFC3010, section 11.]  NFS elected to not deal
> with the normalization.

I don't think that's correct. NFSv4 doesn't require normalization on
the wire, but it *does* say that implementations must deal properly
with normalization. Quote of the end of section 11 (my enphasis):

   The NFS version 4 protocol does not mandate the use of a particular
   normalization form at this time.  A later revision of this
   specification may specify a particular normalization form.
   Therefore, the server and client can expect that they may receive
   unnormalized characters within protocol requests and responses.  If
   the operating environment requires normalization, then the
   implementation *MUST NORMALIZE* the various UTF-8 encoded strings
   within the protocol before presenting the information to an
   application (at the client) or local file system (at the server).

I admit the text is a little vague, but my interpretation is that if
the operating system's open() function deals properly with unicode
equivalence and do the right thing (unlikely on a unix system, don't
know about windows), then the nfs implementation can pass
filenames directly to open without normalizing then. However, if the
open function distinguishes between various equivalent forms of
filenames (i.e. not implementing unicode equivalence in the way that
is *required* for unicode conformance), then the NFS code must perform
normalization to the form the rest of the system uses.

I'd prefer always sending normalized data on the wire, but it is
acceptable with unnormalized data as long as implementations are
*required* to do the right thing.

> So, for example, trying to negotiate the
> char-set in use during protocol startup or
> even on a per path basis, is insufficient.

I don't understand why you can't negotiate characterset using some
extension at the initial sftp handshake. I understand that there are
some problems with what rfc 3010 calls "local character sets", but I
see no problem with an extension that you use at startup that says
"let's use the universal utf8 characterset/encoding".

I also find the discussion in RFC3010 a little strange. If one ever
uses different character sets for different components of a path name,
one should expect some problems. I agree totally with that. But that
isn't a problem with sftp or nfs: you'll get about the same problems
if everything is on a single local filesystem. So I'll put those
problems in in the "broken local configuration" category.

I think the best way forward with utf8 support is that someone who
wants it writes up draft specifying an extension for enabling utf8
file names (or an extension for negotiating arbitrary character sets),
and then we can iron out the details in the wg.

/Niels



Home | Main Index | Thread Index | Old Index