Re: wide characters and i18n

To: tech-userlevel%netbsd.org@localhost
Subject: Re: wide characters and i18n
From: Matthew Mondor <mm_lists%pulsar-zone.net@localhost>
Date: Fri, 16 Jul 2010 12:37:34 -0400

On Fri, 16 Jul 2010 16:50:12 +0100
Sad Clouds <cryintothebluesky%googlemail.com@localhost> wrote:

> 2. The interfaces for C library multi-byte to wide, and wide to
> multi-byte conversion functions are so badly designed, it's not even
> funny. The biggest problem with those functions is the fact they expect
> NULL terminated strings. If you have a partial (not NULL terminated)
> string in the buffer, you cant call string conversion function on it,
> because it won't stop until it finds a NULL and you end up with buffer
> overrun. You cannot "artificially" NULL terminate the string, because
> after reading NULL char, the function will reset mbstate_t object to the
> initial state. This will mess up the next sequence of multi-byte
> characters if the encoding had state.
> 
> I spent two days, jumping through the hoops and trying to figure out
> how to convert partial strings. I think I nailed it in the end with 30%
> performance penalty, but still 3.5 times faster than iconv().
> 
> If anyone is interested, I can post the code for the wrapper
> functions...

In case it can serve, I also wrote an implementation of UTF-8 <->
UTF-32 and put it under BSD-like license:

http://cvs.pulsar-zone.net/cgi-bin/cvsweb.cgi/~checkout~/mmondor/mmsoftware/mmlib/utf8.c?rev=1.2;content-type=text%2Fplain
http://cvs.pulsar-zone.net/cgi-bin/cvsweb.cgi/~checkout~/mmondor/mmsoftware/mmlib/utf8.h?rev=1.1;content-type=text%2Fplain

I however have no benchmark comparing it against an other implementation.
-- 
Matt

References:
- Re: wide characters and i18n
  - From: Sad Clouds
- Re: wide characters and i18n
  - From: Ken Hornstein
- Re: wide characters and i18n
  - From: Sad Clouds

Prev by Date: Re: wide characters and i18n
Next by Date: Re: wide characters and i18n
Previous by Thread: Re: wide characters and i18n
Next by Thread: Re: wide characters and i18n
Indexes:

Home | Main Index | Thread Index | Old Index