tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: using the interfaces in ctype.h
> On 21-Apr-08, at 1:57 PM, Joerg Sonnenberger wrote:
>> On Mon, Apr 21, 2008 at 01:52:41PM -0400, Greg A. Woods; Planix,
>> Inc. wrote:
>>> Actually, no, it doesn't, at least on NetBSD. Try it! :-)
>>
>> Sure. See attached code.
> You must have some different release of NetBSD than any I have. I get
> "0 0" from your program on stock 4.0, a netbsd-4 branch ("4.0_STABLE")
> system, and of course on 1.6.2 too. Same on Mac OS X 10.5.2 as well.
I didn't trace entire discussion. But I have one note.
The code below is totally wrong
char a = ...
...
is*(a)
or
char a = ...
...
to{lower,upper}(a)
People speaking Slavic languages such as Russian, Belarusian and Ukrainian
know this VERY WELL. This happens because both mostly used charsets
(KOI8-R/KOI8-U and CP-1251) assign a letter to the code 255.
KOI8-R - upper-case SHORT_I, CP1251 - lower-case YA.
The code above is wrong because, for example,
toupper((char) lower_case_ya_letter) returns lower_case_ya_letter, not
UPPER_CASE_YA_LETTER,
isalpha((char) lower_case_ya_letter) returns 0 (false) etc.
All this is because tolower(EOF) == toupper(EOF) == EOF, EOF == -1
and is*(EOF) == 0.
There are LOTS of programs with this type of issues.
Another problem is that this problem is NOT seen on Linux.
because heir to* and is* functions
work "correctly" with negative values in range [-128..-2].
As a result those who live with iso-8859-* locales
do not see this problem. These charsets just do not define 0xFF symbol.
http://www.opengroup.org/onlinepubs/009695399/functions/tolower.html
http://www.opengroup.org/onlinepubs/009695399/functions/toupper.html
http://www.opengroup.org/onlinepubs/009695399/functions/isalpha.html
In order to notify developers about this problem to* and is* functions
should work like this
int toupper (int c)
{
assert(c == EOF || c >= 0 && c <= UCHAR_MAX);
...
}
--
Best regards, Aleksey Cheusov.
Home |
Main Index |
Thread Index |
Old Index