NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

lib/58913: ctype(3) macros fail on (unsigned)EOF



>Number:         58913
>Category:       lib
>Synopsis:       ctype(3) macros fail on (unsigned)EOF
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    lib-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Dec 18 02:25:00 +0000 2024
>Originator:     Taylor R Campbell
>Release:        current, 10, 9, ...
>Organization:
The isnetbsd EOF unsigned
>Environment:
>Description:
In order to provoke the compiler to detect abuse of the ctype.h functions (see man 3 ctype, or https://man.NetBSD.org/ctype.3, for the gory details), we define them to be macros like so:

#define	isspace(c)	((int)((_ctype_tab_ + 1)[(c)] & _CTYPE_S))

This way, when c is an expression of type char, gcc will warn about -Wchar-subscripts (which is included in -Wall).

Unfortunately, there is an argument that this implementation is also incorrect.  The specification is:

	The header <ctype.h> declares several functions useful for classifying and mapping characters.  In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF.  If the argument has any other value, the behavior is undefined.

The functions are then described as, e.g.:

	int isspace(int c);

If you call such a function as isspace((unsigned)EOF), the effect should be identical to calling it as isspace((int)(unsigned)EOF) or just isspace(EOF), since EOF is specified to expand to an integer constant expression with type int (and a negative value).

But when c is an expression of type unsigned, say (unsigned)-1, the fragment (_ctype_tab_ + 1)[(c)] behaves very differently: instead of returning the element of the array one entry _before_ (_ctype_tab_ + 1), it returns the element of the array four billion, two hundred ninety-four million, nine hundred sixty-seven thousand, two hundred ninety-five entries _after_ (_ctype_tab_ + 1).  Which (in a >32-bit universe, anyway, like a modern PDP-10 variant with a 36-bit address space) lands squarely in undefined territory, somewhere in the vicinity of SIGSEGV, a demon-emitting nostril, and a security vulnerability.  Pooh.

Now, there is an argument that _no_ value of type unsigned `equal[s] the value of the macro EOF', because the value of the macro EOF is prescribed to be negative, so maybe we could hide behind that argument.  But apparently there's a real-world application that tries this anyway: https://mail-index.netbsd.org/current-users/2024/12/15/msg045888.html
>How-To-Repeat:
isspace((unsigned)EOF)
>Fix:
Maybe we can use __builtin_choose_expr and __builtin_types_compatible_p:

#define	__ctype_lookup(tab, c)						      \
	(int)__builtin_choose_expr(					      \
	    __builtin_types_compatible_p(__typeof__(c), char),		      \
	    ((tab) + 1)[c],						      \
	    ((tab) + 1)[(int)(c)])
#define	isspace(c)	(__ctype_lookup(_ctype_tab_, c) & _CTYPE_S)
#define	tolower(c)	(__ctype_lookup(_tolower_tab, c))

This seems to work in cursory tests.  We could define __ctype_lookup conditionally depending on __GNUC__ (and maybe get ragge@ to make sure pcc has a way to trigger the warning too) so it doesn't break various other consumers of ctype.h.

Or we could declare that applications passing (unsigned)EOF are broken because there is a halfway-credible reading of the spec that we can claim to hide behind.



Home | Main Index | Thread Index | Old Index