Re: UTF8

To: ietf-ssh%netbsd.org@localhost
Subject: Re: UTF8
From: der Mouse <mouse%Rodents.Montreal.QC.CA@localhost>
Date: Fri, 24 Dec 2004 17:39:21 -0500 (EST)

>> All the same UTF-8 issues I've raised repeatedly,
> My requirement as an IESG member is that it be possible to have a
> properly internationalized ssh.  Among other things that means the
> characters in usernames and passwords need to belong to some
> character set.

Why?  Perhaps this is the fundamental part I'm missing.  What does
"properly internationalized" mean - or perhaps more precisely, what is
there about being "properly internationalized" that demands that
usernames, passwords, and filenames consist of character sequences
rather than octet sequences?

> Personally, I'd be delighted if you could work within this
> requirement and figure out implementation advice or language that
> made it possible/easier for people on Unix and other systems where
> local characters are not tagged with a character set.

I doubt that's possible, because the characters, if any, those octet
sequences are intended to correspond to are not stored anywhere
accessible to ssh (and indeed may not exist - the octet sequences may
correspond to characters only coincidentally, by someone insisting on
interpreting them as character codes, such as for display).

> I believe the documents are acceptable if you don't manage to solve
> this problem; only accepting ASCII is one way to implement the
> protocol.

I find that...well, unacceptable, really.  There is no reason except
administrative fiat that the various octet strings in question can't be
real transparent octet strings, to be interpreted as characters, if at
all, by mutual agreement between the entities doing so.  For example,
I, on a charset-blind system, may name some files with octet sequences
that make sense in 8859-1 and others with octet sequences that make
sense in KOI-8, relying on other information (stored nowhere but my
wetware) to disambiguate which is which.  There is no technical reason
I couldn't sftp them to another, similarly charset-blind, system; it is
purely fiat that would prevent it.

Of course, it breaks when trying to sftp to a charset-aware filesystem,
unless I choose to use UTF-8 on the charset-blind side.  That's
unavoidable, no matter what the standard says; interoperability between
charset-string filesystems and octet-string filesystems without human
help cannot be expected in any case.

Similar remarks apply to usernames and passwords, though the examples
I've come up with are more contrived.

> I suspect people who find better solutions will have an advantage in
> certain markets.

I think my implementation, being for charset-blind systems, will treat
such things as pure octet sequences.  I may add an option to disallow
non-ASCII (or perhaps just disallow anything that's not well-formed
UTF-8), but if so it will be to satisfy protocol conformance pedants,
and be documented as such.

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse%rodents.montreal.qc.ca@localhost
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Follow-Ups:
- Re: UTF8
  - From: Sam Hartman

References:
- latest drafts
  - From: der Mouse
- UTF8
  - From: Sam Hartman

Prev by Date: UTF8
Next by Date: Comment on draft-ietf-secsh-transport-22.txt
Previous by Thread: UTF8
Next by Thread: Re: UTF8
Indexes:

Home | Main Index | Thread Index | Old Index