NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
lib/58209: <cctype> lacks compile-time diagnostics for char abuse
>Number: 58209
>Category: lib
>Synopsis: <cctype> lacks compile-time diagnostics for char abuse
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: lib-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Apr 28 15:00:02 +0000 2024
>Originator: Taylor R Campbell
>Release: current, 10, 9, ...
>Organization:
The NetBSD std::isfoundation
>Environment:
>Description:
The <cctype> functions, such as std::isprint/isdigit/isalpha and std::toupper/tolower, have a singularly troublesome specification: Their argument has type int, but they are only defined on inputs that are either (a) the value of the EOF macro (which on NetBSD is -1), or (b) representable by unsigned char. In other words, there are exactly 257 allowed inputs: {-1, 0, 1, 2, 3, ..., 255}. Any other inputs lead to undefined behaviour.
This is because they are meant for use with I/O functions like std::istream.peek:
int ch;
while ((ch = std::cin.peek()) != EOF) {
if (std::isspace(ch))
...
}
Using them to process arbitrary contents of, e.g., std::string requires explicit conversion to unsigned char:
std::string s = ...;
for (i = 0; i < s.size(); i++) {
if (std::isspace(static_cast<unsigned char>(s[i])))
...
}
Without this conversion, on machines where char is signed such as x86, char values outside the 7-bit US-ASCII range are either (a) undefined behaviour, or (b) in the case of the all-bits-set octet, conflated with EOF.
Our standard C <ctype.h> definitions are crafted to trigger the -Wchar-subscripts compiler warning, by defining, e.g., isspace(c) as a macro that expands into ((_ctype_tab_ + 1)[c] & bits). But that doesn't work with C++; we can't expand `std::isspace(c)' into `std::((_ctype_tab_ + 1)[c] & bits)'. So C++ code with ctype abuse (like https://github.com/ledger/ledger/issues/2340) gets no compile-time feedback, and bad runtime feedback (https://gnats.netbsd.org/58208) leading to simply confusing behaviour (like https://github.com/ledger/ledger/issues/2338).
>How-To-Repeat:
#include <cctype>
#include <string>
std::string s = {static_cast<char>(0xb5), 0;
std::cout << std::isspace(s[0]) << std::endl;
>Fix:
Maybe we can teach <cctype> to overload isspace &c., or find some template magic, that will trigger a warning at compile-time.
Home |
Main Index |
Thread Index |
Old Index