tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: A draft for a multibyte and multi-codepoint C string interface
>If Pike and Thompson think the syscall interface should be opaque octet
>strings, with UTF-8 awareness limited to userland, I agree with them.
In the "Hello World" paper they technically did not address the system
call interface. From that paper:
Little change was required: null-terminated UTF strings are
equivalent to null-terminated ASCII strings for most purposes of the
operating system.
Note that they said ASCII, not "opaque octet strings". Also:
There are a couple of aspects of the Unicode Standard we have not
faced. One is the issue of right-to-left text such as Hebrew or
Arabic. Since that is an issue of display, not representation, we
believe we can defer that problem for the moment without affecting
our ability to solve it later. Another issue is diacriticals and
‘combining characters’, which cause overstriking of multiple
Unicode characters. Although necessary for some scripts, such as
Thai, Arabic, and Hebrew, such characters confuse the issues for
Latin languages because they generate multiple representations
for accented characters. ISO 10646 describes three levels
of implementation; in Plan 9 we decided not to address the
issue. Again, this can be labeled as a display issue and its
finer points are still being debated, so we felt comfortable
deferring. Mañana.
So they knew it was an issue, and decided to not deal with it.
--Ken
Home |
Main Index |
Thread Index |
Old Index