NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences



>Number:         58612
>Category:       lib
>Synopsis:       c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    lib-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Aug 17 19:40:00 +0000 2024
>Originator:     Taylor R Campbell
>Release:        current
>Organization:
The NetBSD Shift Sequence tomb
>Environment:
>Description:
The new c8rtomb/c16rtomb/c32rtomb functions in libc, introduced in C11 (and C23 for c8rtomb), use _citrus_iconv_convert to convert a single Unicode scalar value (specifically, a four-byte UTF-32LE byte sequence) to the locale-dependent multibyte character encoding.

Other than buffering the UTF-8/16 decoding in the cases of c8rtomb and c16rtomb, this conversion is stateless, so if it previously produced a shift sequence to a non-initial state, such as ESC ( J in ISO-2022-JP to switch from US-ASCII to ISO/IEC 646:JP, it will also produce a shift sequence back to the initial state.

Although this output may be correct, it is suboptimal -- and may not fit in the output buffer of MB_CUR_MAX bytes.
>How-To-Repeat:
char buf[128];
c16rtomb(&buf[0], L'A', NULL);    /* LATIN SMALL LETTER A */
c16rtomb(&buf[1], 0xe3a5, NULL);  /* YEN SIGN */

This should produce four bytes of output (three bytes to shift from US-ASCII to ISO/IEC 646:JP, one byte for YEN SIGN in ISO/IEC 646:JP), but instead it produces seven bytes (shift, YEN SIGN, shift back).  A subsequent c16rtomb with U+e3a5 (YEN SIGN) should only produce another one byte of output because it has already shifted to ISO/IEC 646:JP, but instead it produces another seven bytes.
>Fix:
Figure out how to use the internal Citrus API to convert a Unicode scalar value to locale-dependent wchar_t, and use wcrtomb instead of _citrus_iconv_convert in c32rtomb in order to produce the output with state.  (Since c8rtomb and c16rtomb are defined in terms of c32rtomb, nothing else is needed for them.)



Home | Main Index | Thread Index | Old Index