netbsd-bugs: lib/36938: mbtowc misbehaving after invalid char sequence

Subject: lib/36938: mbtowc misbehaving after invalid char sequence
To: None <lib-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <neil@daikokuya.co.uk>
List: netbsd-bugs
Date: 09/06/2007 13:15:00

>Number:         36938
>Category:       lib
>Synopsis:       mbtowc fails converting valid sequences after invalid one
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    lib-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Sep 06 13:15:00 +0000 2007
>Originator:     neil@daikokuya.co.uk
>Release:        NetBSD 4.99.23
>Organization:
>Environment:
System: NetBSD duron.akihabara.co.uk 4.99.23 NetBSD 4.99.23 (GENERIC) #0: Sun Jul 15 10:39:38 JST 2007 root@duron.akihabara.co.uk:/usr/src/sys/arch/i386/compile/GENERIC i386
	libc.so.12.150
Architecture: i386
Machine: i386
>Description:

See commented example below.  After the invalid sequence, it fails
to convert a valid sequence.  This is not limited to UFT-8; it also
happens for other encodings so I believe the problem is generic,
if indeed it is a bug.  If it's not a bug, mbtowc would
seem to be useless in practice.  Code below succeeds on Linux.

#include <assert.h>
#include <locale.h>
#include <stdlib.h>

/* Valid 2-byte shift-JIS character, not valid UTF-8 sequence.  */
const char sjis[] = "\x95\x5c";   
/* Valid UTF-8, of course.  */
const char space[] = " ";

int main (void)
{
  wchar_t wc;

  setlocale (LC_CTYPE, "ja_JP.UTF-8");

  /* Assert it is not state-dependent.  */
  assert (mbtowc (&wc, 0, 1) == 0);

  /* Assert my charset beliefs.  */
  assert (mbtowc (&wc, space, sizeof space) == 1);
  assert (mbtowc (&wc, sjis, sizeof sjis) == -1);

  /* Unnecessary assertion that we're not state-dependent, but
     just in case some state needs resetting.  */
  assert (mbtowc (&wc, 0, 1) == 0);

  /* This assertion fails - I believe incorrectly.  */
  assert (mbtowc (&wc, space, sizeof space) == 1);

  return 0;
}
>How-To-Repeat:
	Compile and run above.
>Fix:
	Unknown

>Unformatted:
 	Around Jul15 2007