Subject: lib/20873: UTF8 mbrtowc() doesn't return -1 when given illegal UTF8 sequence
To: None <gnats-bugs@gnats.netbsd.org>
From: None <khym@azeotrope.org>
List: netbsd-bugs
Date: 03/24/2003 04:03:05
>Number: 20873
>Category: lib
>Synopsis: UTF8 mbrtowc() doesn't return -1 when given illegal UTF8 sequence
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: lib-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Mar 24 02:04:00 PST 2003
>Closed-Date:
>Last-Modified:
>Originator: Dave Huang
>Release: NetBSD-current as of March 23, 2003
>Organization:
Name: Dave Huang | Mammal, mammal / their names are called /
INet: khym@azeotrope.org | they raise a paw / the bat, the cat /
FurryMUCK: Dahan | dolphin and dog / koala bear and hog -- TMBG
Dahan: Hani G Y+C 27 Y++ L+++ W- C++ T++ A+ E+ S++ V++ F- Q+++ P+ B+ PA+ PL++
>Environment:
System: NetBSD fluff.azeotrope.org 1.6P NetBSD 1.6P (FLUFF) #10: Sun Mar 9 21:06:23 CST 2003 khym@fluff.azeotrope.org:/usr/obj.i386/FLUFF i386
Architecture: i386
Machine: i386
$NetBSD: citrus_utf8.c,v 1.6 2002/03/28 10:53:49 yamt Exp $
>Description:
mbrtowc(3) is supposed to return (size_t)-1 if it's given an
illegal sequence of multibyte characters. However, if the locale is
set to a UTF-8 locale, it doesn't set the return value at all and
returns garbage.
>How-To-Repeat:
#include <errno.h>
#include <locale.h>
#include <stdio.h>
#include <string.h>
#include <wchar.h>
int main(void)
{
/* 0xa7 can never be the first byte of a UTF-8 sequence */
char s[] = "\xa7";
wchar_t wc;
mbstate_t mbstate;
size_t r;
setlocale(LC_ALL, "");
/* initialize mbstate */
r = mbrtowc(NULL, NULL, 0, &mbstate);
r = mbrtowc(&wc, s, strlen(s), &mbstate);
printf("mbrtowc returned %d, errno = %d\n", (int)r, errno);
return 0;
}
% env LC_ALL=en_US.UTF-8 ./test_mbrtowc
mbrtowc returned 536973768, errno = 85
>Fix:
Index: citrus_utf8.c
===================================================================
RCS file: /cvsroot/src/lib/libc/citrus/modules/citrus_utf8.c,v
retrieving revision 1.6
diff -u -r1.6 citrus_utf8.c
--- citrus_utf8.c 2002/03/28 10:53:49 1.6
+++ citrus_utf8.c 2003/03/24 09:47:15
@@ -276,6 +276,7 @@
ilseq:
psenc->chlen = 0;
+ *nresult = (size_t)-1;
return (EILSEQ);
restart:
>Release-Note:
>Audit-Trail:
>Unformatted: