IETF-SSH archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
UTF-8 [was Re: New Version Notification - draft-sgtatham-secsh-iutf8-05.txt]
> What=E2=80=99s inherently broken in using UTF-8...?
Different characters occupy different amounts of space.
(Some) characters are larger than one addressing unit (most machines).
There are octet sequences which are not valid UTF-8 character
sequences. This results in text tools that break on small amounts of
non-UTF-8 text mixed into the text they're handling. (This is not
really a problem with UTF-8 proper - there are also octets that are not
valid 8859-1 text, for example - but a problem with how it's
implemented; in my experience UTF-8 text tools break when faced with
non-UTF-8 octet sequences, whereas single-octet text tools usually
don't break when faced with invalid octets.)
Some characters have multiple distinct encodings. (Okay, that too is
not really UTF-8 proper - it's actually Unicode.)
I've seen it said (by the git documentation) that transcoding from some
character sets like 8859-1 to UTF-8 is not a reversible operation.
This seems dubious to me, but, if true, it would be another, and fairly
strong, strike against UTF-8 in my opinion.
That's just what come to mind immediately. I don't use UTF-8 myself if
I can help it (when I run into something using it my major concern is
how to make it stop doing so), so it's entirely possible there are
others I'm just not aware of.
/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML mouse%rodents-montreal.org@localhost
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Home |
Main Index |
Thread Index |
Old Index