Re: UTF8

To: der Mouse <mouse%Rodents.Montreal.QC.CA@localhost>
Subject: Re: UTF8
From: nisse%lysator.liu.se@localhost (Niels Möller)
Date: 04 Jan 2005 13:45:01 +0100

der Mouse <mouse%Rodents.Montreal.QC.CA@localhost> writes:

> I'm faced with an encoding-agnostic
> filesystem interface and implementation, wherein filename components
> are sequences of octets not including 0x00 and 0x2f, independent of any
> characters;

Please leave the file system issues out of it for now. What's of
primary importantance are the core drafts, and those deal with
usernames and passwords in utf8 form, *not* file names. The issues for
filenames, e.g. in sftp, are slightly different, and not relevant to
the core drafts.

> I'm faced with password hashing routines that work with
> octet strings, not character strings; etc.

> Am I required to reject attempted non-ASCII
> strings in these places for no reason other than an inability to know
> what the user intended the character set - if any - to be?  (For that
> matter, what grounds are there for assuming that octets in the ASCII
> range are intended to correspond to ASCII characters, rather than, say,
> KOI-7?)

I'm assuming you're talking about the server implementation now
(client side is comparatively trivial; convert input to utf8 based on
the current $LC_CTYPE). On the server side, problem is that at login
time, you don't know the user's $LC_CTYPE. My recommendation is as
follows:

1. Chose one default encoding (be that plain ascii, or latin1, or
   koi-7, or normalized utf-8, depending on your context and
   preference).

2. Provide an option for the sysadmin to say that on his or her
   particular system, some other character set is used for user names
   and passwords.

Then convert the usernames and passwords you get on the wire to the
selected encoding. That's almost solves the problem, and it's no big
deal.

Optionally, to support systems where different users use different
character sets for their usernames and/or passwords, use some per user
configuration or kludgery to figure out the user's character set.

I'll be happy to discuss these implementation issues (my
implementation doesn't get non-ascii quite right yet either), but we
should probably do that off-list.

> Given how common such systems are, it seems a bit odd that the IETF
> would take a position so apparently incompatible with them.

Do you have some numbers to back that up? I've seen quite some number
of unix systems, but as far as I can recall, I've *never* seen one
where usernames and passwords used non-ascii characters. (I *have*
seen plenty of non-ascii filenames, but as I said, that's a different
issue, and irrelevant to the core drafts). I live in latin1-land, not
asia, though.

Best regards,
/Niels

Follow-Ups:
- Re: UTF8
  - From: Markus Gyger
- Re: UTF8
  - From: Joseph Galbraith
- Re: UTF8
  - From: der Mouse

References:
- latest drafts
  - From: der Mouse
- UTF8
  - From: Sam Hartman
- Re: UTF8
  - From: der Mouse
- Re: UTF8
  - From: Sam Hartman
- Re: UTF8
  - From: der Mouse

Prev by Date: Re: UTF8
Next by Date: Re: UTF8
Previous by Thread: Re: UTF8
Next by Thread: Re: UTF8
Indexes:

Home | Main Index | Thread Index | Old Index