At Thu, 06 May 2021 07:06:43 -0400, Greg Troxel <gdt%lexort.com@localhost> wrote: Subject: Re: toupper and warnings > > Johnny Billquist <bqt%update.uu.se@localhost> writes: > > >> See CAVEATS in ctype(3). > > > > Right. But is gcc really smart enough to understand at compile time if > > something else than -1 is the negative value, and that toupper in fact > > is more limited than what the signature says? > > > > The *signature* of the function is int toupper(int). If you pass a > > char to that, I can't see that there would ever be a warning about any > > problems. Well the warning is actually an almost accidental side effect due to the implementation as a macro and due to the way the particular way the macro is expanded into an array reference. The potential to access an array with a negative index is obvious to the compiler in the current standard NetBSD implementation. Note too that actually implementing some of the ctype.h style interfaces as real functions on a system using signed chars could be problematic as they could then not internally distinguish between -1 and 0xFF when passed a plain un-cast signed char value (a value of 0xFF is of course a valid character, but due to the default integer promotions for all function parameters the sign will be extended and the result will be (int) -1). This matters for iscntrl(), isalpha(), and potentially also toupper() and tolower() depending on exactly how the return value is used. > To answer Greg Woods, It is not "conservative" to silently accept > UB-causing inputs and turn them into UB array reads. Well I don't actually do that. Which is to say my implementation's goal is to turn them into properly promoted values and access the flags array in such a way that this does not and cannot "cause" UB. My implementation carefully tests for a -1 and then if it is not -1 it uses a cast and assignment to an unsigned char, the value of which is then used to access the array. I.e. totally safe _and_ UB-free. Thus I take the conservative approach of not causing anything bad or unexpected to happen, including not causing UBSAN to abort the program. The only bad thing is that I do is to silently coerce truly badly behaving code into continuing to work without complaint, even from UBSAN. My test code for my implementation is here: https://github.com/robohack/experiments/blob/master/tctype.c So far it's only been tested with a Clang and a range of GCC versions. I can send the patch that adds it to -current too if you'd like to see it in-situ. > Another approach would be for NetBSD to adust the code to bounds check > before indexing and do something if the value is out of bounds. Yes, and my implementation could also easily be adjusted to do that. > Options > are returning false, returning a random bit, calling abort(3), trying to > send all of the user's files to a random IP addreess, and so on. All > are allowed by the spec :-) Yeah, "Undefined Behaviour" should be undefined -- i.e. removed from the spec -- i.e. become either fully defined or at least implementation defined. It is not helpful at all -- it was a very VERY bad idea. E.g. for ctype.h interfaces the spec should just say that values outside the recognized range will simply be truncated as if by assignment to an unsigned char. > Probably returning false with an envvar option to abort is best, similar > to how UB from mutex ops is handled. On the other hand perhaps this would better be a job for the UBSAN module to do, but perhaps that's just because I've become much more comfortable with using UBSAN with Clang in recent years. SIGILL for the losers. I'm not sure how my implementation could be so hooked into UBSAN tests though, especially in a less compiler-dependent way. > When we started with abort it > seemed there was a vast amount of incorrect code written and debugged on > other operating systems that allow code specified to hav UB. I haven't had quite that experience, though there have been some rare but stunning examples of problematic code -- though all I can remember were actually caught while porting the code to run on NetBSD/alpha, or in a few cases when porting to platforms that actually SIGBUS on unaligned accesses (like sparc IIRC?). What I am pretty sure of though is that there's a vast difference between the massive number of warnings spit out by the compiler vs. the relatively low number of actual cases of passing values outside of -1..255. We certainly wouldn't want to claim UB and abort for all of the warnings! -- Greg A. Woods <gwoods%acm.org@localhost> Kelowna, BC +1 250 762-7675 RoboHack <woods%robohack.ca@localhost> Planix, Inc. <woods%planix.com@localhost> Avoncote Farms <woods%avoncote.ca@localhost>
Attachment:
pgpka1R3YO42t.pgp
Description: OpenPGP Digital Signature