Subject: Valid and incomplete character sequence and mbrlen()
To: None <tech-userlevel@netbsd.org>
From: Masao Uebayashi <uebayasi@soum.co.jp>
List: tech-userlevel
Date: 10/28/2001 22:59:31
If I read SUSv2 correctly, mbrlen() to incomplete and valid character
sequence should return -2. SUSv2 says:
--------8<--------8<--------8<--------8<--------8<--------8<--------8<
RETURN VALUE
The mbrlen() function returns the first of the following that applies:
0 If the next n or fewer bytes complete the character that corresponds to
the null wide-character.
positive
If the next n or fewer bytes complete a valid character; the value
returned is the number of bytes that complete the character.
(size_t)-2
If the next n bytes contribute to an incomplete but potentially valid
character, and all n bytes have been processed. When n has at least the
value of the MB_CUR_MAX macro, this case can only occur if s points at
a sequence of redundant shift sequences (for implementations with
state-dependent encodings).
(size_t)-1
If an encoding error occurs, in which case the next n or fewer bytes do
not contribute to a complete and valid character. In this case, EILSEQ
is stored in errno and the conversion state is undefined.
--------8<--------8<--------8<--------8<--------8<--------8<--------8<
For example, what's displayed with the -current locale?
Here, mbrlen() to s specifying 1, 2, 3 and 4 as the 2nd argument
should return -2.
--------8<--------8<--------8<--------8<--------8<--------8<--------8<
#include <locale.h>
#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
char buf[1024];
/*
* A multibyte string in ISO-2022-JP. The fist 3 bytes are a shift
* sequence, the next 6 bytes are JIS X 0208 characters, the last 3
* are also a shift sequence.
*/
const char s[] =
{
0x1b, 0x24, 0x42, /* JIS X 0208 */
0x46, 0x7c, 0x4b, 0x5c, 0x38, 0x6c, /* 日本語 */
0x1b, 0x28, 0x42, /* ASCII */
'\0'
};
int
main()
{
mbstate_t *ps;
int i;
int ret;
if (setlocale(LC_ALL, "ja_JP.ISO2022-JP") == NULL)
exit(EXIT_FAILURE);
printf("%s\n", s);
/* Initialize mbstate_t. */
ps = (mbstate_t *)malloc(sizeof(mbstate_t));
memset(ps, 0, sizeof(mbstate_t));
for (i = 0; i < strlen(s); ++i) {
/*
* mbrlen()
*/
ret = mbrlen(s, i, ps);
printf("%d: %d\n", i, ret);
}
return 0;
}
--------8<--------8<--------8<--------8<--------8<--------8<--------8<
I'm using XPG4DL on 1.5. I'm sorry if this is not the case on
-current.
Regards,
Masao