Johnny Billquist <bqt%update.uu.se@localhost> writes: > On 2021-05-06 13:06, Greg Troxel wrote: >> >> Johnny Billquist <bqt%update.uu.se@localhost> writes: >> >>>> See CAVEATS in ctype(3). >>> >>> Right. But is gcc really smart enough to understand at compile time if >>> something else than -1 is the negative value, and that toupper in fact >>> is more limited than what the signature says? >>> >>> The *signature* of the function is int toupper(int). If you pass a >>> char to that, I can't see that there would ever be a warning about any >>> problems. >> >> The signature is that it takes an int, but the specification is that if >> the value of the int is other than EOF or something representable as >> unsigned char (projecting to he implementation, meaning -1 is ok and >> 0..255 is ok), then you get UB. > > Right. My question is just how on earth gcc would know this? There is > nothing in the actual declaration that tells (or even can) tell > this. So this would then (again) be an example of gcc knowing more > about the function than is actually visible. What is going on is that there is a header file that defines toupper as a macro (#define). After expanding that, gcc sees code that is using a char (which is signed) as an array subscript, which can reasonably be expected to have the possibility of out-of-bounds reads. gcc is AFAIK not bringing knowledge of toupper. > What if I wrote my own function called toupper, which was defined for > the full range of an int? Would gcc then understand that this is a > different toupper that it shouldn't warn about? toppper is specified by C99. So yes, you could implement a version that had safer behavior when it is formally undefined. If gcc had the specification expressed somehow, and could do static analysis, and gave a warning that "call to toupper could lead to udefined behavior", then I think that would be great. But it isn't doing that -- and that's more like UBsan than a compiler anyway. A lot of trouble is caused by people writing code that's ok with the implementation in front of them, but that ventures into UB per the standard. So I don't think such code should be accomodated in general.
Attachment:
signature.asc
Description: PGP signature