tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Unicode programming
On Wed, 05 Oct 2011 15:51:52 -0400
Ken Hornstein <kenh%pobox.com@localhost> wrote:
> - Assuming the above is correct ... what do programmers do in terms of
> parsing things like UTF-8 into Unicode codepoints, since you don't
> necessarily know that mbrtowc() will give you a Unicode codepoint on
> some (looks like many) systems. I guess iconv() looks like something
> that handles a lot of encodings, and it seems to be lots of places;
> I'm also aware of icu. I'm also wondering what people do about things
> like finding out how many columns a particular series of Unicode codepoints
> occupies; I know about things like wcswidth(), but again you're not
> guaranteed that wide characters are Unicode codepoints.
When doing it in C, I used a custom library
(http://cvs.pulsar-zone.net/cgi-bin/cvsweb.cgi/mmondor/mmsoftware/mmlib/utf8.c?rev=1.2;content-type=text%2Fplain
and
http://cvs.pulsar-zone.net/cgi-bin/cvsweb.cgi/mmondor/mmsoftware/mmlib/utf8.h?rev=1.1;content-type=text%2Fplain),
but I've not used it in some time and have recently used a higher level
language which supports unicode and already includes the conversion
facilities (and more advanced unicode features than only
encoding/decoding). I used iconv from the shell when I needed it,
however, and remember using it from PHP (I'm not sure if that one was
PHPs or if it used libc's, though)...
--
Matt
Home |
Main Index |
Thread Index |
Old Index