Re: Internationaliztion: UTF-8 for file names

To: Niels Möller <nisse%lysator.liu.se@localhost>
Subject: Re: Internationaliztion: UTF-8 for file names
From: "Joseph Galbraith" <galb-list%vandyke.com@localhost>
Date: Thu, 21 Mar 2002 08:54:56 -0700

> I also find the discussion in RFC3010 a little strange. If one ever
> uses different character sets for different components of a path name,
> one should expect some problems. I agree totally with that. But that
> isn't a problem with sftp or nfs: you'll get about the same problems
> if everything is on a single local filesystem. So I'll put those
> problems in in the "broken local configuration" category.

That depends on the operating system.

NT is perfectly capable of supporting this,
and having it work localy.

I name a directory using cyrillic, and put files named
in Japenese inside of it.  Not a problem.  That's because
NT uses a unicode to store file names.

(Now, I admit, this won't work if you are using the FAT file
system, but that's a different story, but it is possible for
this to work perfectly using NTFS.)

My guess is that there are other filesystems
out there that use unicode for file names.
BeOS's filesystem did, if I remember correctly.

> I think the best way forward with utf8 support is that someone who
> wants it writes up draft specifying an extension for enabling utf8
> file names (or an extension for negotiating arbitrary character sets),
> and then we can iron out the details in the wg.

Well, we could go that route, however, in that case, 
when UTF-8 is not in use, we must specify what charset
is in use, according to my reading of RFC2277 3.1.

Also, my reading of RFC2277 3.1 leads me to believe that
not using UTF-8 is at least discouraged for new protocols.

I really think just specifying filenames as being encoded
in UTF-8 is the best solution.  UTF-8 is already used
throughout the other pieces of SSH; it complies with RFC2277,
and it allows systems that can support multiple char-sets to
work.

It also doesn't require us to register with IANA charset
specifications, or try to use some existing specification
(which unless it's Code Page information is hard for me --
and I really don't want anyone in the world to even think
about code pages unless they MUST :-)  Not to mention that
code pages are poorly specificied and non-standard.

We might be able to go with some other solution and still be
in compliance with RFC2277 as long as UTF-8 could be negotiated,
but that just complicates things because then we have to deal
with a client and server that can't come to an agreement on
what charset to use.

I really think my users would be happiest if this stuff just
worked, which is possible with UTF-8 required, but seems
unlikely otherwise.

- Joseph

Follow-Ups:
- Re: Internationaliztion: UTF-8 for file names
  - From: Niels Möller

References:
- Internationaliztion: UTF-8 for file names
  - From: Joseph Galbraith
- Re: Internationaliztion: UTF-8 for file names
  - From: Niels Möller

Prev by Date: Re: SFTP owner, group and mode flags...
Next by Date: Re: SFTP owner, group and mode flags...
Previous by Thread: Re: Internationaliztion: UTF-8 for file names
Next by Thread: Re: Internationaliztion: UTF-8 for file names
Indexes:

Home | Main Index | Thread Index | Old Index