Subject: Re: Unicode support in iso9660.
To: der Mouse <mouse@Rodents.Montreal.QC.CA>
From: Jaromir Dolecek <jdolecek@NetBSD.org>
List: tech-kern
Date: 11/19/2004 21:37:24
der Mouse wrote:
> > I think this could be handled if UTF-8 were the standard encoding for
> > userland<->kernel interaction, yes?
>
> I think this would be a mistake. File names have tradtionally been
> opaque octet sequences containing 0x2f only as pathname component
> separator and containing 0x00 only as terminator, not character
> sequences, and I think changing that would be a Wrong Thing.
UTF-8 doesn't change that. UTF-8 stores US-ASCII character (0-127)
as-is, i.e. 1 byte values 0-127. Characters outside US-ASCII
are encoded to sequence of bytes with values 128-256. Such
sequence never contains US-ASCII characters. So '/' in UTF-8 string
can only ever be the '/' and so is 0x00.
Jaromir
--
Jaromir Dolecek <jdolecek@NetBSD.org> http://www.NetBSD.cz/
-=- We should be mindful of the potential goal, but as the Buddhist -=-
-=- masters say, ``You may notice during meditation that you -=-
-=- sometimes levitate or glow. Do not let this distract you.'' -=-