Subject: Re: iconv and conversion from/to local charset and wchar_t
To: None <tech-userlevel@netbsd.org>
From: Hendrik Sattler <ubq7@stud.uni-karlsruhe.de>
List: tech-userlevel
Date: 02/12/2004 18:37:13
Hi,
Am Freitag, 30. Januar 2004 18:25 schrieb Noriyuki Soda:
> So, to make scmxx work on both NetBSD-1.6 and NetBSD-current,
> the following code is needed to convert locale dependent string
> to UCS-4:
> [2] http://mail-index.netbsd.org/tech-userlevel/2004/01/31/0000.html:
> BTW, Solaris 8 and Solaris 9 only support UTF-7/8/16 for the direct
> conversion from/to UCS-4.
> So, you have to use the following way to make your program work
> on Solaris 8 and Solaris 9:
> 1. use iconv(nl_langinfo(CODESET), "UTF-8") to convert
> locale dependent string to UTF-8,
> Of course, this can be omitted, if nl_langinfo(CODESET)
> returns "UTF-8".
> then
> 2. use iconv("UTF-8", "UCS-4") to convert UTF-8 to
> UCS-4 with machine depdenent endianness.
> (1. can be omitted,
I now rewrote the whole thing:
http://cvs.sf.net/viewcvs.py/scmxx/scmxx_C/src/unicode.c?rev=1.8&view=markup
It works without the above work-around (intermediate conversion to UTF-8) on
Solaris8/Sparc just fine.
It also works fine on systems using GNU iconv.
Thomas mailed me that there are still problems with NetBSD-current like
"Error on text conversion to internal charset: Illegal byte sequence" (EILSEQ)
when using a custom escape sequence like "\20ac" (EuroSign). It also shows '?'
on output that should be formatted as "\XXXX" (like "\20ac") instead.
The main problem is that only NetBSD-current has this problem. Maybe iconv()
is broken and returns 0 although a char was not translatable and thus mapped
to '?'?
Additionally, does your iconv() tries to interpret escape sequences (chars
after a '\')? If yes: please don't.
Or maybe I did something wrong and it works on two different implementations
by accident?
Maybe one of you can take a look at it?
Thanks
Hendrik