Re: Unicode programming

To: tech-userlevel%netbsd.org@localhost
Subject: Re: Unicode programming
From: Matthew Mondor <mm_lists%pulsar-zone.net@localhost>
Date: Wed, 5 Oct 2011 19:01:55 -0400

On Wed, 05 Oct 2011 15:51:52 -0400
Ken Hornstein <kenh%pobox.com@localhost> wrote:

> - Assuming the above is correct ... what do programmers do in terms of
>   parsing things like UTF-8 into Unicode codepoints, since you don't
>   necessarily know that mbrtowc() will give you a Unicode codepoint on
>   some (looks like many) systems.  I guess iconv() looks like something
>   that handles a lot of encodings, and it seems to be lots of places;
>   I'm also aware of icu.  I'm also wondering what people do about things
>   like finding out how many columns a particular series of Unicode codepoints
>   occupies; I know about things like wcswidth(), but again you're not
>   guaranteed that wide characters are Unicode codepoints.

When doing it in C, I used a custom library
(http://cvs.pulsar-zone.net/cgi-bin/cvsweb.cgi/mmondor/mmsoftware/mmlib/utf8.c?rev=1.2;content-type=text%2Fplain
and
http://cvs.pulsar-zone.net/cgi-bin/cvsweb.cgi/mmondor/mmsoftware/mmlib/utf8.h?rev=1.1;content-type=text%2Fplain),
but I've not used it in some time and have recently used a higher level
language which supports unicode and already includes the conversion
facilities (and more advanced unicode features than only
encoding/decoding).  I used iconv from the shell when I needed it,
however, and remember using it from PHP (I'm not sure if that one was
PHPs or if it used libc's, though)...
-- 
Matt

References:
- Unicode programming
  - From: Ken Hornstein

Prev by Date: Re: Unicode programming
Next by Date: Re: Unicode programming
Previous by Thread: Re: Unicode programming
Next by Thread: Re: Unicode programming
Indexes:

Home | Main Index | Thread Index | Old Index