tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Proposal: _ctype_ table bitwidth change
> > The most important point is that is* functions accept an octet, not a
> > code point.
>
> They do? Where is this defined?
>
> Historically, it has been false: is*() has been documented to accept
> "characters", which I can't read as anything but codepoints.
>
> That some charsets have some codepoints that can't fit in unsigned char
> (at least when, as on NetBSD, unsigned char is just one octet) just
> means that is*() aren't useful for more than just 256 of their possible
> codepoints, not that they somehow get retconned to take just one octet
> of a storage encoding of a codepoint.
>
> At least, that's how I read it. Is there a spec somewhere which spells
> this out precisely?
As far as I know, there is no explicit description.
However, to begin with, ISO C doesn't define the concept of like "codepoint."
It defines only two representation; "(single-byte/multibyte) character" and
"wide character".
I wonder how is* functions are affected by undefined concept.
In addition, ISO C contains the part implying that is* functions accept
an "octet".
7.25.2.1 Wide character classification functions:
Each of the following functions (note: isw* functions) returns true
for each wide character that corresponds (as if by a call to the wctob
function) to a single-byte character for which the corresponding
character classification function (note: is* functions) from 7.4.1
returns true, except that the iswgraph and iswpunct functions may
differ with respect to wide characters other than L' ' that are both
printing and white-space wide characters.
('note' is inserted by me.)
Note that this part was added at revision in 1995 (C95).
ISO C seems to contain some ambiguity about "character,"
especially in the part that has been existing since 1989 (C89).
---
Takuya SHIOZAKI
Home |
Main Index |
Thread Index |
Old Index