tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Unicode programming
On Wed, Oct 05, 2011 at 07:54:47PM -0400, Ken Hornstein wrote:
> >> I'm also wondering what people do about things
> >> like finding out how many columns a particular series of Unicode
> >> codepoints
> >> occupies
> >
> >This is very much nontrivial. There are a certain number of codepoints
> >which have an ambiguous number of columns. You might also run into
> >situations where the renderer might not be able to display combining
> >diacritics in the expected way.
>
> Is this true for stuff inside of the BMP?
Yeah, they exist within the BMP, mostly within CJK/East Asian; see
http://unicode.org/reports/tr11/#Ambiguous for some info.
As far as surrogates in UTF-16: yeah, they only exist in UTF-16; they're
one of the primary differentiations between UTF-16 and UCS-2. One of the
_other_ bugaboos with UTF-16 is that you need to keep track of the byte
order and/or insert a BOM to deambiguate what kind of stream you're
generating.
Home |
Main Index |
Thread Index |
Old Index