tech-misc archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: wchar_t encoding?
Paul Koning <Paul_Koning%dell.com@localhost> wrote:
>> > ...
>> > The trouble for NetBSD is that it asks iconv to translate to a
>> character
>> > set named "wchar_t". That means "whatever the encoding is for the
>> > wchar_t data type". GNU libiconv supports that, so on platforms
> that
>> > use that library things are fine.
>
> I did some digging to see how libiconv implements that feature.
>
> If __LIBC_ISO_10646__ is defined then it simply aliases this to an
> appropriate width Unicode (ucs2 or ucs4). That applies to Linux, for
> example.
>
> If it isn't defined (as is the case on NetBSD) but mbrtowc() exists,
> then it uses that function. More precisely, a conversion to "wchar_t"
> first converts to Unicode, which is then fed into mbrtowc to produce the
> wchar_t encoding. mbrtowc knows about any locale issues...
>
> I guess that means that "multibyte" is Unicode, or UTF-8??? I don't see
> that documented in any manpage. It also means that if you have a source
> character that's not in Unicode but is in whatever encoding wchar_t
> uses, it would not be handled by the libiconv implementation of iconv()
> because it uses Unicode as an intermediate form.
Yeah, this fallback seems bogus. mbtowc &co exepct the source to be
in the current charset, so it's wrong to feed it unicode data (even if
wchar_t *is* always unicode internally).
SY, Uwe
--
uwe%stderr.spb.ru@localhost | Zu Grunde kommen
http://snark.ptc.spbu.ru/~uwe/ | Ist zu Grunde gehen
Home |
Main Index |
Thread Index |
Old Index