Re: UTF8

To: ietf-ssh%NetBSD.org@localhost
Subject: Re: UTF8
From: der Mouse <mouse%Rodents.Montreal.QC.CA@localhost>
Date: Tue, 4 Jan 2005 00:23:55 -0500 (EST)

>> What does "properly internationalized" mean - or perhaps more
>> precisely, what is there about being "properly internationalized"
>> that demands that usernames, passwords, and filenames consist of
>> character sequences rather than octet sequences?
> I'd appreciate replies off-list as I believe we are outside of the
> scope of this working group.

In general, perhaps, but insofar as it bears on implementing ssh, I am
inclined to disagree - which is why I'm replying on-list anyway.

> [...]
> We, the IETF, e have decided for the most part that interoperability
> requires that things work independent of what input method is used.

As I see it, this amounts to "the IETF position is that humans think of
these things as character strings, so we demand that they be handled as
character strings by the protocol".

What is the IETF position, then, on how someone such as me should
handle the situation I'm faced with: writing software specified from
this point of view (ssh, in my case) for systems on which these
entities are _not_ character strings (a fairly traditional Unix
variant, NetBSD in my case)?  I'm faced with an encoding-agnostic
filesystem interface and implementation, wherein filename components
are sequences of octets not including 0x00 and 0x2f, independent of any
characters; I'm faced with password hashing routines that work with
octet strings, not character strings; etc.

Are such systems beyond the pale for the IETF, and I can do anything I
want, with a suggestion that I try to stay within something like the
spirit of the spec?  Is it simply not possible to implement ssh (or
anything else specified with similar normalization rules) on such a
system within the spec without converting all the affected code
(filename, username, and password handling in ssh's case) to the
character-string paradigm?  Am I required to reject attempted non-ASCII
strings in these places for no reason other than an inability to know
what the user intended the character set - if any - to be?  (For that
matter, what grounds are there for assuming that octets in the ASCII
range are intended to correspond to ASCII characters, rather than, say,
KOI-7?)

Or what?

Given how common such systems are, it seems a bit odd that the IETF
would take a position so apparently incompatible with them.  As an
implementer I find the situation rather confusing; there's obviously
something I don't understand going on, and I'd like to know what the
IETF's idea of the right thing for me to do here is.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse%rodents.montreal.qc.ca@localhost
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Follow-Ups:
- Re: UTF8
  - From: Sam Hartman
- Re: UTF8
  - From: Niels Möller
- Re: UTF8
  - From: Jeffrey Altman

References:
- latest drafts
  - From: der Mouse
- UTF8
  - From: Sam Hartman
- Re: UTF8
  - From: der Mouse
- Re: UTF8
  - From: Sam Hartman

Prev by Date: Re: UTF8
Next by Date: Re: UTF8
Previous by Thread: Re: UTF8
Next by Thread: Re: UTF8
Indexes:

Home | Main Index | Thread Index | Old Index