NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: lib/58913: ctype(3) macros fail on (unsigned)EOF



> Date: Wed, 18 Dec 2024 16:47:11 +0700
> From: Robert Elz <kre%munnari.OZ.AU@localhost>
> 
>     Date:        Wed, 18 Dec 2024 02:25:01 +0000 (UTC)
>     From:        campbell+netbsd%mumble.net@localhost
>     Message-ID:  <20241218022501.3BA4A1A923B%mollari.NetBSD.org@localhost>
> 
>   | Now, there is an argument that _no_ value of type unsigned
>   | `equal[s] the value of the macro EOF', because the value of the
>   | macro EOF is prescribed to be negative, so maybe we could hide
>   | behind that argument.
> 
> The argument is entirely valid, there is no hiding involved.
> 
> (unsigned)EOF is not equal to EOF, nor can its value be
> represented by an unsigned char, hence by the text that you
> quoted, the behaviour is explicitly undefined.

Assume EOF is -1 (i.e., is an expression of type int that evaluates to
the value -1).

Suppose you have a function with the prototype:

	int foo(int c);

Consider the call:

	int y = foo((unsigned)EOF);

What argument value is passed to the function foo?

Under C99 6.5.2.2 Function calls (p. 72):

	7. If the expression that denotes the called function has a
	   type that does include a prototype, the arguments are
	   implicitly converted, as if by assignment, to the types of
	   the corresponding parameters, taking the type of each
	   parameter to be the unqualified version of its declared
	   type.

So this function call is equivalent to:

	int x = (unsigned)EOF;
	int y = foo(x);

There are two conversions here, from int to unsigned and back.  The
semantics of conversions is specified in C99 6.3 Conversions,
subsection 6.3.1.3 Signed and unsigned integers (p. 43).  The
conversion from int EOF to unsigned is given by:

	2. Otherwise, if the new type is unsigned, the value is
	   converted by repeatedly adding or subtracting one more than
	   the maximum value that can be represented in the new type
	   until the value is in the range of the new type.

Thus, x = UINT_MAX.  The conversion from unsigned UINT_MAX to int is
given by:

	3. Otherwise, the new type is signed and the value cannot be
	   represented in it; either the result is
	   implementation-defined or an implementation-defined signal
	   is raise.

The three choices for the conversion from unsigned UINT_MAX to int, in
principle, are:

(a) raise a signal
(b) yield something other than -1
(c) yield -1

In practice, though, on all NetBSD ports, converting unsigned UINT_MAX
to int yields -1.

So the argument value represented by type int that is passed to foo,
under 6.5.2.2 Function calls, is -1.

(That is, unless you want to argue that in NetBSD the conversion
should raise a signal or return something other than the two's
complement answer for this direction of conversion.)

Under 7.4 Character handling <ctype.h>, all of these functions are
given the same prototype, e.g. 7.4.1.10 The isspace function, p. 183:

	Synopsis

	1.	#include <ctype.h>
		int isspace(int c);

There is language allowing library functions to be additionally
implemented as function-like macros in C99 7.1.4 Use of library
functions:

	1. Each of the following statements applies unless explicitly
	   stated otherwise in the detailed descriptions that follow:
	   ... Any function declared in a header may be additionally
	   implemented as a function-like macro defined in the header.

I can't find any language allowing the function-like macro to have
different evaluation rules from a function, here or in 7.4, except in
sequence points.  There is only language requiring the function-like
macro _not_ to differ in certain ways:

	   ... Any invocation of a library function that is
	   implemented as a macro shall expand to code that evaluates
	   each of its arguments exactly once, fully protected by
	   parentheses where necessary, so it is generally safe to use
	   arbitrary expressions as arguments.^162 ...

	Footnotes:
	162) Such macros might not contain the sequence points that
	     the corresponding function calls do.

If there is an expression E for which the function call (isspace)(E)
has different semantics, except for the sequence points, from the
macro expansion of isspace(E), that looks like a bug to me.


Home | Main Index | Thread Index | Old Index