Subject: Re: lib/36938: mbtowc misbehaving after invalid char sequence
To: None <lib-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Takehiko NOZAKI <th-nozaki@netwrk.co.jp>
List: netbsd-bugs
Date: 11/13/2007 17:15:05
The following reply was made to PR lib/36938; it has been noted by GNATS.
From: Takehiko NOZAKI <th-nozaki@netwrk.co.jp>
To: gnats-bugs@netbsd.org
Cc: neil@daikokuya.co.uk
Subject: Re: lib/36938: mbtowc misbehaving after invalid char sequence
Date: Wed, 14 Nov 2007 00:19:13 +0900
hi, Neil.
> tnozaki marked this bug closed, but it seems did not understand
> the report.
>
current src/lib/libc/citrus/modules/citrus_utf8.c
(and other multibye encoding modules) implementation:
219 /* make sure we have the first byte in the buffer */
220 if (psenc->chlen == 0) {
221 if (n-- < 1)
222 goto restart;
223 psenc->ch[psenc->chlen++] = *s0++;
224 }
225
226 c = _UTF8_count[psenc->ch[0] & 0xff];
227 if (c < 1 || c < psenc->chlen)
228 goto ilseq;
- read first 1-byte into internal-state(line 223).
- check it whether valid character or not(line 226-227).
so that internal-state always become ``none-initial'' state.
OTOH many mbtowc(3) implementations,
(AFAIK glibc2, Solaris, FreeBSD, MSVC++6) seems that:
- check first 1-byte is valid character or not(if invalid, return -1).
- store it into internal-state for restart.
so that internal-state remains ``initial'' state.
but ``How to store internal-state with pieces of multibyte sequence''
is implementation defined behavior, because SUSv3's documentation
doesn't mention about it(correct me if i'm wrong).
http://opengroup.org/onlinepubs/007908799/xsh/mbtowc.html
# in case of mbrtowc(3) and mbstate_t,
# "the conversion state is undefined" when return value is (size_t)-1.
#
# http://opengroup.org/onlinepubs/007908799/xsh/mbrtowc.html
# http://opengroup.org/onlinepubs/007908799/xsh/wchar.h.html
so that, whether current locale is stateless or stateful,
you can not omit to re-initialize internal state of mbtowc(3) by #if 0'ed,
i think.
...but we are minority, we might change behavior in the future.
very truly yours.
--
Takehiko NOZAKI <tnozaki@NetBSD.org>