tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: A draft for a multibyte and multi-codepoint C string interface
On Tue, Apr 02, 2013 at 07:45:42PM -0400, James K. Lowden wrote:
> On Tue, 2 Apr 2013 12:21:03 -0400
> Thor Lancelot Simon <tls%panix.com@localhost> wrote:
>
> > On Tue, Apr 02, 2013 at 06:08:01PM +0200, tlaronde%polynum.com@localhost
> > wrote:
> > >
> > > That UTF-8 is the answer, since this allows to use C "char" (at
> > > least an octet, signed or unsigned) programs.
> >
> > Except it can't, really, quite be UTF-8 -- it has to be "Modified
> > UTF-8", because C strings can't contain 0.
>
> What are you referring to, exactly? UTF-8 and ASCII both represent NUL
> with 0. The filename rule is that only '/' and NUL are prohibited. I
Non-NUL UTF8 sequences can contain bytes with value 0, breaking C string
handling. There's a common workaround, but, technically, once you apply
it, you are no longer compliant with UTF8. You're emitting "Modified UTF-8"
like Java does. It's the great thing about standards: pick one...
Home |
Main Index |
Thread Index |
Old Index