tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Proposal: _ctype_ table bitwidth change
hi,
> Yes, but that doesn't mean they can't use the same format. The problem
> in this case is that 0xa0 is not a valid UTF8 sequence by itself.
first, don't forget __STDC_ISO_10646__ wchar_t implementation.
as you know NetBSD/Citrus is CSI(CodeSet Independent), so wchar_t !=
UCS4 if encoding is not UTF-8.
but we still have ability/flexibility to provide wchar_t == UCS4
normalized implementation like glibc2.
(e.g ja_JP.eucJP@ucs4, ar_AR.ISO-8859-6@ucs4, past Tru64UNIX did).
for example, consider Arabic singlebyte locale "ar_AR.ISO-8859-6":
[singlebyte] <---> [wide-character]
0xac U+0x060C
apparently we can't share _ctype_/rl_runetype table.
second, for some restricted environment(embedded device, old computer etc...)
we offer makefile knob WITH_RUNE=NO to disable multibyte locale support
to reduce libc size.
if you expose RuneLocale structure, we have to kill the knob and
always use huge rune locale db file.
# i think it is not good news for some third party.
> Drop the current _CTYPE_* macros for anything but
> legacy purposes.
my patch have _CTYPE_* macro, but it is not same as legacy ctype.h:
see following diff:
Index: sys/sys/ctype_bits.h
===================================================================
RCS file: /cvsroot/src/sys/sys/ctype_bits.h,v
retrieving revision 1.2
diff -u -r1.2 ctype_bits.h
--- sys/sys/ctype_bits.h 14 Dec 2010 02:28:57 -0000 1.2
+++ sys/sys/ctype_bits.h 8 Jan 2011 14:01:27 -0000
@@ -40,16 +40,22 @@
#ifndef _SYS_CTYPE_BITS_H_
#define _SYS_CTYPE_BITS_H_
-#define _CTYPE_U 0x01
-#define _CTYPE_L 0x02
-#define _CTYPE_N 0x04
-#define _CTYPE_S 0x08
-#define _CTYPE_P 0x10
-#define _CTYPE_C 0x20
-#define _CTYPE_X 0x40
-#define _CTYPE_B 0x80
+#define _CTYPE_A 0x0001 /* Alpha */
+#define _CTYPE_C 0x0002 /* Control */
+#define _CTYPE_D 0x0004 /* Digit */
+#define _CTYPE_G 0x0008 /* Graph */
+#define _CTYPE_L 0x0010 /* Lower */
+#define _CTYPE_P 0x0020 /* Punct */
+#define _CTYPE_S 0x0040 /* Space */
+#define _CTYPE_U 0x0080 /* Upper */
+#define _CTYPE_X 0x0100 /* X digit */
+#define _CTYPE_B 0x0200 /* Blank */
+#define _CTYPE_R 0x0400 /* Print */
+#define _CTYPE_I 0x0800 /* Ideogram */
+#define _CTYPE_T 0x1000 /* Special */
+#define _CTYPE_Q 0x2000 /* Phonogram */
-extern const unsigned char *_ctype_;
+extern const unsigned short *_ctype_tab_;
extern const short *_tolower_tab_;
extern const short *_toupper_tab_;
relation of _CTYPE_* and _RUNETYPE_* is:
(_CTYPE_A << 8) == _RUNETYPE_A
very truly yours.
--
Takehiko NOZAKI<takehiko.nozaki%gmail.com@localhost>
Home |
Main Index |
Thread Index |
Old Index