Subject: Re: [Summer of Code]Wide Character Support in curses
To: None <tech-userlevel@netbsd.org>
From: James K. Lowden <jklowden@schemamania.org>
List: tech-userlevel
Date: 06/12/2005 20:44:45
Ruibiao Qiu wrote:
> On Tue, 7 Jun 2005, Julian Coleman wrote:
>
> > In order to support these functions, the curses internal storage of
> > characters and attributes needs to be modified. For example, each
> > character position might be described by a structure containg:
> >
> > character value (32 bits)
> > character attributes (32 bits)
> > character width
> > non-spacing character list/pointer
One way to address Thor's concern would simply be to make the character
value size a compile-time constant. Effectively it's 1 today; using 2
bytes would meet the vast majority of needs. It's hard to imagine UTF-32
*curses* applications.
ISTM that wide characters have the same attributes as "narrow" ones, so
that storage requirement doesn't change. Looking at
/usr/include/curses.h, attributes in __LDATA are 18 bits (and data 8). A
little preprocessor magic should let you dereference __LDATA differently,
depending on whether you're using 1-byte widths (data in bits 0-7 of
__LDATA) or more (data in next word).
(I'm not sure this is worth the effort and complexity, really. I wonder
what platform/application Thor has in mind that would be materially
affected by even a 4x increase in curses memory? How much memory are we
talking about? The smaller the device, the more languages it needs to
support....)
The character width is fixed; it needn't be stored per-character. E.g.,
'aaaaaa' has only one character width, that of 'a'. And what's the domain
of character widths? 0-4 cells, no? Can any character be wider than
that? That needs only 2 bits/character, or 16K for the whole of UCS-2.
You can use the character value to index into the width map. Actually,
there *is* room in __LDATA for 2 bits of width data, but I'm from the
"never copy, always reference" school of data management, so I would
always refer to the map.
Nonspacing characters have a width of zero. I don't see any advantage to
maintaining a separate list of them. If you do, though, the list will be
short; there aren't many.
Sounds like an interesting project.
--jkl