NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
lib/58612: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
>Number: 58612
>Category: lib
>Synopsis: c8rtomb/c16rtomb/c32rtomb yield suboptimal shift sequences
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: lib-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Aug 17 19:40:00 +0000 2024
>Originator: Taylor R Campbell
>Release: current
>Organization:
The NetBSD Shift Sequence tomb
>Environment:
>Description:
The new c8rtomb/c16rtomb/c32rtomb functions in libc, introduced in C11 (and C23 for c8rtomb), use _citrus_iconv_convert to convert a single Unicode scalar value (specifically, a four-byte UTF-32LE byte sequence) to the locale-dependent multibyte character encoding.
Other than buffering the UTF-8/16 decoding in the cases of c8rtomb and c16rtomb, this conversion is stateless, so if it previously produced a shift sequence to a non-initial state, such as ESC ( J in ISO-2022-JP to switch from US-ASCII to ISO/IEC 646:JP, it will also produce a shift sequence back to the initial state.
Although this output may be correct, it is suboptimal -- and may not fit in the output buffer of MB_CUR_MAX bytes.
>How-To-Repeat:
char buf[128];
c16rtomb(&buf[0], L'A', NULL); /* LATIN SMALL LETTER A */
c16rtomb(&buf[1], 0xe3a5, NULL); /* YEN SIGN */
This should produce four bytes of output (three bytes to shift from US-ASCII to ISO/IEC 646:JP, one byte for YEN SIGN in ISO/IEC 646:JP), but instead it produces seven bytes (shift, YEN SIGN, shift back). A subsequent c16rtomb with U+e3a5 (YEN SIGN) should only produce another one byte of output because it has already shifted to ISO/IEC 646:JP, but instead it produces another seven bytes.
>Fix:
Figure out how to use the internal Citrus API to convert a Unicode scalar value to locale-dependent wchar_t, and use wcrtomb instead of _citrus_iconv_convert in c32rtomb in order to produce the output with state. (Since c8rtomb and c16rtomb are defined in terms of c32rtomb, nothing else is needed for them.)
Home |
Main Index |
Thread Index |
Old Index