Subject: Re: [Summer of Code]Wide Character Support in curses
To: Julian Coleman <jdc@coris.org.uk>
From: Ruibiao Qiu <ruibiao@arl.wustl.edu>
List: tech-userlevel
Date: 06/12/2005 16:23:42
On Tue, 7 Jun 2005, Julian Coleman wrote:
> In order to support these functions, the curses internal storage of
> characters and attributes needs to be modified. For example, each
> character position might be described by a structure containg:
>
> character value (32 bits)
> character attributes (32 bits)
> character width
> non-spacing character list/pointer
As Thor pointed out, this layout could quadruple the memory footprint. It can
be argued that wide characters normally are also multi-column characters,
therefore there are less characters in a line, and thus the memory footprint
may not necessarily quadruple. However, it still could more than double the
memory for certain character sets, e.g. 2-character sets like simplified and
traditional Chinese and possibly Japanese Kanji.
To improve the memory usage, I propose a different structure than my original
structure. Essentially, it is about the same as the existing storage
structure. That is, the character value is still an 8-bit character. In this
case, we don't need a width field, and a m-column wide character uses m
storage structures. To make it represent the correct meaning of the wide
characters, we need to add an attribute of alignment or position-in-word to
indicate the start of a wide character. The value of a wide character can be
recovered with fast bit operations from all characters with the correct
alignment and order. Similarly, when inserting a wide character, a bit
operation can put the character values in the m structure, and set up the
alignment attribute right.
This is just some initial ideas of mine, and I may overlook some points. Any
suggestion and feedback is highly appreciated. Please cc your reply to me, as
I do not currently subscribe to the tech-userlevel mailing list. Thanks.
Ruibiao