Re: A draft for a multibyte and multi-codepoint C string interface

To: tech-userlevel%netbsd.org@localhost
Subject: Re: A draft for a multibyte and multi-codepoint C string interface
From: Thor Lancelot Simon <tls%panix.com@localhost>
Date: Wed, 3 Apr 2013 08:45:57 -0400

On Tue, Apr 02, 2013 at 07:45:42PM -0400, James K. Lowden wrote:
> On Tue, 2 Apr 2013 12:21:03 -0400
> Thor Lancelot Simon <tls%panix.com@localhost> wrote:
> 
> > On Tue, Apr 02, 2013 at 06:08:01PM +0200, tlaronde%polynum.com@localhost 
> > wrote:
> > > 
> > > That UTF-8 is the answer, since this allows to use C "char" (at
> > > least an octet, signed or unsigned) programs.
> > 
> > Except it can't, really, quite be UTF-8 -- it has to be "Modified
> > UTF-8", because C strings can't contain 0.
> 
> What are you referring to, exactly?  UTF-8 and ASCII both represent NUL
> with 0.  The filename rule is that only '/' and NUL are prohibited.  I

Non-NUL UTF8 sequences can contain bytes with value 0, breaking C string
handling.  There's a common workaround, but, technically, once you apply
it, you are no longer compliant with UTF8.  You're emitting "Modified UTF-8"
like Java does.  It's the great thing about standards: pick one...

Follow-Ups:
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: Mouse

References:
- A draft for a multibyte and multi-codepoint C string interface
  - From: Daode
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: Mouse
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: Daode
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: Mouse
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: James K. Lowden
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: tlaronde
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: Daode
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: tlaronde
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: Thor Lancelot Simon
- Re: A draft for a multibyte and multi-codepoint C string interface
  - From: James K. Lowden

Prev by Date: Re: A draft for a multibyte and multi-codepoint C string interface
Next by Date: Re: A draft for a multibyte and multi-codepoint C string interface
Previous by Thread: Re: A draft for a multibyte and multi-codepoint C string interface
Next by Thread: Re: A draft for a multibyte and multi-codepoint C string interface
Indexes:

Home | Main Index | Thread Index | Old Index