IETF-SSH archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: additional core draft nits in need of WG attention.



At last, the final comment in this series...

Bill Sommerfeld <sommerfeld%east.sun.com@localhost> writes:

> > >4.  Section 5, last paragraph on page 9.  Saying that UTF-8 is the
> > >encoding for passwords means that implementations need to check for valid
> > >UTF-8 encoding.  This could lead to unexpected failures. It would be much
> > >better to say that passwords are arbitrary binary strings with no
> > >specified encoding.  Exact match of the binary strings ought to 
> > >be sufficient.
> 
> Thoughts?  My understanding is that requesting exact match of
> internationalized input is problematic under some circumstances..

This is confused. For both usernames and passwords, the server have to
massage the input into it's native format (unicode normalization form
C, or latin-1, or whatever) before further processing. The further processing
is then typically lookup in a database (for usernames) and "one-way
encryption" for passwords. In both cases, the operation will typically
fail if the input is not normalized.

Treating usernames and passwords differently makes no sense. Either,
we say that both usernames and passowrds are ascii-only (8-bit
characters could be allowed, with implementation defined meaning, so
that we get a royal mess where users will never know whether or not if
8-bit cahracters will work), or we say that both usernames and
passwords are utf8.

I've argued earlier that the sender of utf8 strings should be required
to normalize them (unicode normalization form C, or nameprep, not sure
what's most appropriate). But I didn't get much support for that, and
then the server MUST do the right thing when converting the strings to
its native format. And if the server does the right thing for
usernames, there's no extra cost in doing the same for passwords.

The goal of the utf8 use is to be able to support scenarios like this:

  * Unix server with usernames and passwords encoded in latin-1 in the
    /etc/passwd file, and running in a latin-1 locale.
  
  * Unix client, also in a latin-1 locale.
  
  * Windows Pocket PC client, which is a native unicode application
    and has never heard of latin-1.

  * The username "Åke Ärlig", which can be encoded in at least 6
    different equally correct ways in unicode as well as in proper
    utf8 without overlong sequences.

If this doesn't Just Work, then the protocol is broken. And it seems
we are asked to break it.

/Niels



Home | Main Index | Thread Index | Old Index