NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: standards/58601: uchar.h C23 compliance: char8_t, mbrtoc8, c8rtomb



> Date: Fri, 16 Aug 2024 18:01:14 +1000
> from: matthew green <mrg%eterna23.net@localhost>
> 
> > +typedef unsigned char		char8_t;
> 
> could / should this check CHAR_BIT == 8 before defining?

C23, Sec. 7.30 `Unicode utilities <uchar.h>', clause 3:

   The types declared are ...

	char8_t

   which is an unsigned integer type used for 8-bit characters and is
   the same type as unsigned char; ...

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf#page=426

This is independent of whether CHAR_BIT is exactly 8 or larger.

Note that char16_t and char32_t may be wider than 16 or 32 bits,
respectively -- they are specified to be uint_least16_t and
uint_least32_t, not uint16_t and uint32_t.

> i like how there's a lot of tests that someone else wrote :)

Should maybe add some more c8rtomb and mbrtoc8 tests -- I just adapted
the ones that I found in FreeBSD for c16rtomb and mbrtoc16, but it
doesn't exercise the full range of possible invalid UTF-8 byte shapes
or UTF-8 byte sequence lengths or forbidden redundant encodings.


Home | Main Index | Thread Index | Old Index