tech-misc archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: wchar_t encoding?
Paul Koning <Paul_Koning%dell.com@localhost> wrote:
> I'm working on a patch to gdb 7.1 to make it work on NetBSD. The issue
> is that GDB 7 uses iconv to handle character strings, and uses wide
> chars internally so it can handle various non-ASCII scripts.
>
> The trouble for NetBSD is that it asks iconv to translate to a character
> set named "wchar_t". That means "whatever the encoding is for the
> wchar_t data type". GNU libiconv supports that, so on platforms that
> use that library things are fine.
>
> The trouble is that I'm getting pushback on the patch, because of
> concerns that the encoding used for wchar_t is not actually UCS-4.
> In particular, there is this article:
> http://www.gnu.org/software/libunistring/manual/libunistring.html#The-wchar_005ft-mess
> which says that on Solaris and FreeBSD the encoding of wchar_t is
> "undocumented and locale dependent". (Ye gods!)
Why are they so surprised about that? C99 says:
3.7.3
[#1] wide character
bit representation that fits in an object of type wchar_t,
capable of representing any character in the current locale
It's simply impossible to always use unicode as the only encoding for
wchar_t, since not all charsets are 1:1 with unicode.
Besides, iconv does not return (fsvo "return") wide strings, it
returns good old pointer to char. Do they pass a pointer to wchar_t
as destination?
If they just assume it's going to be a pointer to wide string, then
correct implementation of "wchar_t" is for iconv to convert to a plain
string in current charset and then convert that to a wide string.
Or do they actually assume it's gonna be utf32?
SY, Uwe
--
uwe%stderr.spb.ru@localhost | Zu Grunde kommen
http://snark.ptc.spbu.ru/~uwe/ | Ist zu Grunde gehen
Home |
Main Index |
Thread Index |
Old Index