Re: UTF8

To: der Mouse <mouse%Rodents.Montreal.QC.CA@localhost>, ietf-ssh%NetBSD.org@localhost
Subject: Re: UTF8
From: Jeffrey Hutzelman <jhutz%cmu.edu@localhost>
Date: Thu, 20 Jan 2005 18:26:57 -0500

On Thursday, January 20, 2005 11:09:37 -0500 der Mouse<mouse%Rodents.Montreal.QC.CA@localhost> wrote:

Perhaps it appears that way to you.  I see no particular need for it.
(Except for conformance to the fiat of UTF-8, which since it's the
point under discussion can be ignored for the moment.)

Actually, that point is _not_ up for discussion. IETF policy, as describedin RFC2277 which represents the consensus of the IETF, says exactly what wemust do with character data. I do not believe that revisiting thatdecision is in scope for us.

The current discussion centers around whether usernames and passwords asseen in ssh are character strings or not.

In your example, it would be ISO-8859-1.


For that user, it is.  Another user of the same system may be of Greek
extraction and use 8859-7, with username, say, 0xec 0xe1 0xf1 0xea (mu
alpha rho kappa, which I can't insert directly because this message is
in 8859-1, and I'd rather not mess around with trying to create a
multipart/mixed message).  And what's wrong with that?

Aha. I do believe that you just admitted this is a character string, andyou couldn't include it in your message because its character set did notmatch that of the rest of the message. Note, incidentally, that you couldhave resolved this problem by using UTF-8 instead of ISO-8859-1 -- MIMEdoes conform to IETF character set policy.

There appears to be an unstated assumption lurking beneath all the
character-string positions that users are incapable of making, or at
least unwilling to make, adjustments and allowances for certain
impedance mismatches - allowances such as Åge, when visiting Mark,
realizing that to log in he has to type epsilon g e as his username.

Those assumptions are not at all unstated; I think we've made them quiteclear. The point of internationalization of text strings is that usersshould not even have to think about those kinds of "allowances", let aloneactually make them. Åge should be able to use Mark's keyboard to type Åge(hint - I'm having no trouble typing it on my US keyboard) and have it work.

But see, here is the real problem. Suppose I have a nice, modern,Unicode-using system. The one I'm sitting in front of at the moment usesUTF-8 for nearly everything. Windows boxes also use Unicode.

If I have a compliant ssh client, and Åge's home machine has a compliantserver, then he can type his username, which my client will encode on thewire as <0xe3><0xa5><0x67><0x65>. His ssh server will then convert that toits character set, in which it is represented as <0xe5><0x67><0x65>, andhe'll be able to log in.

On the other hand, if my ssh client and his insist on treating thischaracter string as an "octet string" that does not require tagging orconversion, then Åge cannot log in at all, because on my UTF-8 system he_cannot_ produce the sequence <0xe5><0x67><0x65>, because that is not validUTF-8.


Personally, I prefer the model which allows my Swedish friends to log in.

Follow-Ups:
- Re: UTF8
  - From: der Mouse

References:
- RE: UTF8
  - From: denis bider
- RE: UTF8
  - From: Jeffrey Hutzelman
- Re: UTF8
  - From: der Mouse
- Re: UTF8
  - From: Derek Fawcus
- Re: UTF8
  - From: Simon Josefsson
- Re: UTF8
  - From: Derek Fawcus
- Re: UTF8
  - From: Simon Josefsson
- Re: UTF8
  - From: der Mouse
- Re: UTF8
  - From: Simon Josefsson
- Re: UTF8
  - From: der Mouse

Prev by Date: Re: UTF8
Next by Date: Re: UTF8
Previous by Thread: Re: UTF8
Next by Thread: Re: UTF8
Indexes:

Home | Main Index | Thread Index | Old Index