tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: bin/57544: sed(1) and regex(3) problem with encoding
> This whole "i18n" and "l10n" is a nightmare---and this is a not
> english native speaker who writes it...
And as a native anglophone - who knows a smattering of assorted other
languages - I agree.
I just recently ran into an occasion where something actually got me to
send mail to a domain whose mail was hosted by Google. I sent it as
8859-14, because it involved a small amount of text in one of the
Gaelic dialects and I prefer to use seanċló when I can.
The text included a ċ. But apparently, despite my marking it as
8859-14, by the time it got displayed (in their webmail interface, I
think), it had been converted into U+0104, LATIN CAPITAL LETTER A WITH
OGONEK, rather than the correct mapping, U+010B, LATIN SMALL LETTER C
WITH DOT ABOVE.
So I sent a test mail, containing each of the accented vowels and each
of the dotted consonants (well, most of them; I forgot Ṫ and ṫ, but
that's minor).
That mail, for all that it was also marked as being 8859-14, got
displayed as if it were 8859-1.
Not even Google, apparently, can get it even vaguely right.
/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML mouse%rodents-montreal.org@localhost
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Home |
Main Index |
Thread Index |
Old Index