Subject: Re: isprint() and isblank()
To: Noriyuki Soda <soda@sra.co.jp>
From: None <itojun@iijlab.net>
List: tech-userlevel
Date: 01/21/2001 11:33:29
>So, the "_B" bit in _ctype_[] is only used for isprint() test,
>and never used for isblank() test.
>As itojun-san pointed out, we cannot set _B bit for '\t' character,
>because it breaks isprint() implementation.
>But it is not problem, because we do not actually use the _B bit,
>and don't have to use it in future. We have to use for another
>mechanism for isblank() implementation for loadable LC_CTYPE,
>though.
let me re-cap.
- isblank() is hardcoded for C locale definition (' ' or '\t')
isblank() is a function.
- _B is used for isprint(). is _B is true, isprint() becomes true.
isprint() is a macro.
so, with the current macro/libc function, _B means "isprint() goes
true even if it is not isgraph()". _B does not mean "isblank() goes
true" (it is what _B should be - chrtbl(8) and many comments says
so).
the real problem is in isprint(), not isblank() if we keep the
current localetable. and isprint() is not replaceable for compiled
binary.
now, locale files:
- if we load INCORRECT locale table (which sets _B for ' ' only)
isprint() will behave correctly (true for ' ', false for '\t')
- if we load CORRECT locale table (which sets _B for ' ' and '\t')
isprint() will behave incorrectly (true for both ' ' and '\t')
- the problem has been hidden since C locale _ctype_ table, and
old locale table files, are all INCORRECT.
- we now ship with correct locale table, and lib/libc/locale/runeglue.c
converts it into _ctype_. now _ctype_ can have correct locale bit
declarations and isprint() can behave strange.
i can think of couple of workarounds. goal is to not break compiled
binaries. now my question is, which looks best? for me (1a) is the
best but a little bit slower than current code (since we avoid
macro).
1. keep _ctype_ broken.
1a. change ctype.h and/or lib/libc/gen/isctype.c. basically, do:
#define isprint(x) iswprint(x)
this is not a problem since (1) isprint() is declared only for 0x00
to 0xff and -1 (2) wint_t is really a int. new binaries will
always refer correct multibyte locale table.
when we load locale declaration file, we make some trick about _B.
PROS: no macro, then we no longer need to worry about compiled
macro issues in the future.
CONS: slower.
1b. same as (1a), but do a macro expansion of
lib/libc/locale/iswctype.c.
PROS: comparable to the current performance (assuming _CACHED_RUNES
is 255)
CONS: macro issue remains.
1c. don't change ctype.h declarations.
when we load locale declaration file, we make some trick about _B.
PROS: smallest amont of changes.
CONS: new binaries will have incorrect isprint() and isblank(),
forever.
2. fix _ctype_.
2a. version it into two. when we load locale declaration file,
we make some trick about _B on old _ctype_ if the locale
declaration file is correct. we make some trick about _B on new
_ctype_ if the locale decralataion file is incorrect.
discourage people from using old locale declaration files.
CONS: why do we have to maintain two ctype tables when we change
the code? (1a) or (1b) looks much better.
itojun