IETF-SSH archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: UTF8



>>> I'm not sure I see the problem.  Implementations that doesn't know
>>> what charset their authentication database uses, will be limited to
>>> ASCII, or whatever safe subset they can assume.
>> Why?  Why should one octet-string system talking to another
>> octet-string system be unable to use non-ASCII octets?
> You shouldn't compare octet-strings with each other unless you know
> which charset they were encoded in.

(a) Take off your character-string blinders for a moment!  Your
statement is true only if both octet strings represent character
strings, and it's equality of the character strings that you really
care about.

(b) That aside, why shouldn't one octet-string system talking to
another octet-string system be unable to use non-ASCII octets?  If a
human tells them to do this, it's up to the human to ensure they use
sufficiently compatible character sets when character sets matter.  But
such cases are relatively common - if both systems are
encoding-agnostic, but the user has chosen to use the same character
set on both, for example.

> In your example, an SECSH implementation would appear to need to know
> what charset /etc/passwd is using.

Perhaps it appears that way to you.  I see no particular need for it.
(Except for conformance to the fiat of UTF-8, which since it's the
point under discussion can be ignored for the moment.)

> In your example, it would be ISO-8859-1.

For that user, it is.  Another user of the same system may be of Greek
extraction and use 8859-7, with username, say, 0xec 0xe1 0xf1 0xea (mu
alpha rho kappa, which I can't insert directly because this message is
in 8859-1, and I'd rather not mess around with trying to create a
multipart/mixed message).  And what's wrong with that?

There appears to be an unstated assumption lurking beneath all the
character-string positions that users are incapable of making, or at
least unwilling to make, adjustments and allowances for certain
impedance mismatches - allowances such as Åge, when visiting Mark,
realizing that to log in he has to type epsilon g e as his username.

A lot of users doubtless are incompetent to do this.  More are
unwilling to.  But I do not see those as reason to prevent the others
from accepting the mismatch and doing useful things - especially since
I rather suspect that using an encoding-agnostic system like a
traditional Unix correlates positively and moderately strongly with
such ability and willingness.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse%rodents.montreal.qc.ca@localhost
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B



Home | Main Index | Thread Index | Old Index