IETF-SSH archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: UTF8



>> Perhaps it appears that way to you.  I see no particular need for
>> it.  (Except for conformance to the fiat of UTF-8, which since it's
>> the point under discussion can be ignored for the moment.)
> Actually, that point is _not_ up for discussion.  IETF policy, [...]

You cut critical context.  The "it" is having all local usernames on an
encoding-agnostic system be character strings in the same charset, not
anything on the wire for any protocol.  (The "fiat of UTF-8" refers to
the compulsion that implementing a conformant ssh imposes.)

> [IETF policy] says exactly what we must do with character data.  I do
> not believe that revisiting that decision is in scope for us.

No, it isn't, not on this list.

My point is, usernames and passwords aren't always "character data".
They certainly aren't to the username and password checking code on an
encoding-agnostic system.  I see no reason why that can't be exposed to
remote users just as it is to local users.

>>> In your example, it would be ISO-8859-1.
>> For that user, it is.  Another user of the same system may be of
>> Greek extraction and use 8859-7, with username, say, 0xec 0xe1 0xf1
>> 0xea (mu alpha rho kappa, which I can't insert directly [...])
> Aha.  I do believe that you just admitted this is a character string,

Except for the implication ("admitted") that this is something I would
have preferred not to say, yes - to one particular human.  Another
human might see the same octet string as "ìáñê".  Both views exist
solely inside human minds (and, transiently, at the input and output
interfaces between the humans and the computers).  Neither is right or
wrong, independent of context from a human.

> But see, here is the real problem.  [...]
> If I have a compliant ssh client, and Åge's home machine has a
> compliant server, then he can type his username, which my client will
> encode on the wire as <0xe3><0xa5><0x67><0x65>.  His ssh server will
> then convert that to its character set,

How does it know that "its character set" has to be 8859-1 for this
login attempt?  It could be Mark typing instead, in which case it would
have to convert to 8859-7 instead of 8859-1, and it has no a priori way
to tell which is which.  The ssh server, after all, has no way of
knowing that mu alpha rho kappa makes sense in a way that i-acute
a-grave n-tilde e-circumflex doesn't - and even if it did, it would
have to try both to tell which makes more sense (and then try all other
possibilities, too, in case one of them makes even more sense).

> Personally, I prefer the model which allows my Swedish friends to log
> in.

But your model is predicated upon having no encoding-agnostic systems
involved.  As pretty as that may be in theory, as fine as it may be for
you, that is not an option for me.

So far, the only options available for me, as an implementer on an
encoding-agnostic system, appear to be

(1) Ignore conformance in this respect; or
(2) Impose some encoding by system-wide sysadmin fiat.

For the client side, and to some extent for non-username strings, there
is also

(3) Allow users to specify a charset somehow.

None of these options is entirely satisfactory.  But it appears that
the WG has quite firmly got the notion stuck in its collective head
that these things are always, *always* character strings, is unwilling
to do character-set tagging, and is unwilling to tell the IETF "your
policy is too crippling for us, we'll go build something that works,
and if you don't like it, tough".  (Probably too few people both use
encoding-agnostic systems and go beyond - or want to be able to go
beyond - ASCII, and thus don't see it as crippling.)  Not that the last
point is entirely surprising, since this *is* an IETF list - if anyone
knows of any ssh design work being done outside the IETF, I'd be
interested in getting involved.

I suspect I'll have to take option (1), though if it's not too much
work I may try to provide for (2) (and possibly (3)) - in the case of
(2), I expect to be able to do at least "ASCII-only, disallow high-half
octets entirely".

Unless someone has some other alternative to suggest.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse%rodents.montreal.qc.ca@localhost
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B



Home | Main Index | Thread Index | Old Index