tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: A draft for a multibyte and multi-codepoint C string interface
> Non-NUL UTF8 sequences can contain bytes with value 0,
How? As far as I can see, the only way to get a 0 octet into a
UTF-8-encoded string is to encode Unicode codepoint 0. RFC3629 seems
to think so too:
o Character numbers from U+0000 to U+007F (US-ASCII repertoire)
correspond to octets 00 to 7F (7 bit US-ASCII values). [...]
[page break]
o US-ASCII octet values do not appear otherwise in a UTF-8 encoded
character stream. [...]
and, as far as I can see, the encoding actually described does indeed
have those properties.
What am I missing?
/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML mouse%rodents-montreal.org@localhost
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Home |
Main Index |
Thread Index |
Old Index