IETF-SSH archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: UTF8



Derek Fawcus <dfawcus%cisco.com@localhost> writes:

> On Wed, Jan 19, 2005 at 07:39:23PM +0100, Simon Josefsson wrote:
>> 
>> Perhaps it would be useful to consider what problems you would have
>> interacting with EBCDIC systems, to imagine what the situation is for
>> Latin-1 or Unicode users.  ASCII isn't the only 7-bit encoding.
>
> That is part of the point.  My local machine is using UTF-8,  the Unix
> machines I ssh into are in 8859-1,  and I do use a couple of 8 bit
> characters.  However simply not in my username or password.  Hence the
> local unicode characters (when UTF-8 encoded) are valid ASCII,  and
> hence I can log in.

That works as expected if both systems use ASCII.  I'd imagine that
your username in ASCII would be equally valid in EBCDIC, but would
mean something different.

The fact that UTF-8 and Latin-1 are identical for ASCII is a useful
property, but it may limit how you think of the problem.

> You point wrt EBCDIC is valid,  but I don't now the appropriate answer.
> Also,  do their authentication systems work with a charset,  or octets?

If humans entered a name of the user using a keyboard at some point in
time, and the string was stored as octets, the string must have been
encoded using some charset.  You can't go from human symbols to octets
without using some kind of charset.

The only system I can recall that uses true opaque octets would be
some kind of PIN or one-time-pad system that only uses digits.

If the SECSH protocol uses ASCII today, the EBCDIC server would have
to convert the wire charset from ASCII to EBCDIC before it can compare
the string.

This is similar to the UTF-8 situation under discussion:

If the SECSH protocol uses UTF-8 tomorrow, the server will have to
convert the wire UTF-8 data to the local charset, before it can
compare it with the authentication systems.  Only the server
implementation can know the local charset.

As far as I understand, Unix /etc/passwd only support ASCII usernames,
so Unix SECSH implementations would naturally be limited to ASCII.

Other authentication databases may support Unicode (or Latin-1, or
EBCDIC...).

I'm not sure I see the problem.  Implementations that doesn't know
what charset their authentication database uses, will be limited to
ASCII, or whatever safe subset they can assume.  Implementations that
know what charset the authentication mechanism uses, can convert it as
appropriate.

Regards,
Simon



Home | Main Index | Thread Index | Old Index