IETF-SSH archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: UTF8



On Thursday, January 20, 2005 11:09:37 -0500 der Mouse <mouse%Rodents.Montreal.QC.CA@localhost> wrote:


Perhaps it appears that way to you.  I see no particular need for it.
(Except for conformance to the fiat of UTF-8, which since it's the
point under discussion can be ignored for the moment.)

Actually, that point is _not_ up for discussion. IETF policy, as described in RFC2277 which represents the consensus of the IETF, says exactly what we must do with character data. I do not believe that revisiting that decision is in scope for us.

The current discussion centers around whether usernames and passwords as seen in ssh are character strings or not.




In your example, it would be ISO-8859-1.

For that user, it is.  Another user of the same system may be of Greek
extraction and use 8859-7, with username, say, 0xec 0xe1 0xf1 0xea (mu
alpha rho kappa, which I can't insert directly because this message is
in 8859-1, and I'd rather not mess around with trying to create a
multipart/mixed message).  And what's wrong with that?

Aha. I do believe that you just admitted this is a character string, and you couldn't include it in your message because its character set did not match that of the rest of the message. Note, incidentally, that you could have resolved this problem by using UTF-8 instead of ISO-8859-1 -- MIME does conform to IETF character set policy.



There appears to be an unstated assumption lurking beneath all the
character-string positions that users are incapable of making, or at
least unwilling to make, adjustments and allowances for certain
impedance mismatches - allowances such as Åge, when visiting Mark,
realizing that to log in he has to type epsilon g e as his username.

Those assumptions are not at all unstated; I think we've made them quite clear. The point of internationalization of text strings is that users should not even have to think about those kinds of "allowances", let alone actually make them. Åge should be able to use Mark's keyboard to type Åge (hint - I'm having no trouble typing it on my US keyboard) and have it work.

But see, here is the real problem. Suppose I have a nice, modern, Unicode-using system. The one I'm sitting in front of at the moment uses UTF-8 for nearly everything. Windows boxes also use Unicode.

If I have a compliant ssh client, and Åge's home machine has a compliant server, then he can type his username, which my client will encode on the wire as <0xe3><0xa5><0x67><0x65>. His ssh server will then convert that to its character set, in which it is represented as <0xe5><0x67><0x65>, and he'll be able to log in.

On the other hand, if my ssh client and his insist on treating this character string as an "octet string" that does not require tagging or conversion, then Åge cannot log in at all, because on my UTF-8 system he _cannot_ produce the sequence <0xe5><0x67><0x65>, because that is not valid UTF-8.

Personally, I prefer the model which allows my Swedish friends to log in.






Home | Main Index | Thread Index | Old Index