tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: A draft for a multibyte and multi-codepoint C string interface
On Sat, Apr 06, 2013 at 07:47:00PM -0400, Mouse wrote:
>
> I'm talking about the way 0x00 and 0x2f octets are special in pathnames
> at the syscall interface. This is annoying to applications that want
> to name files with fundamentally non-character-string data. The live
> use case I mentioned a message or two ago is an example - that really
> _wants_ to name files with time_t values (actually, time_t plus a
> disambiguator serial number).
I think you will admit that your example of "binary" filenames is not a
common use. This means that to use these filenames, the utilities have
to know about the convention. A "binary" filename can be represented
without ado by an hexadecimal (or whatever else base) string that is
full ASCII, that is identically UTF-8, that is already possible now
without changing everything (and if one really wants, the binary
filenames can have a suffix giving the base: 0x...).
The solution is mainly in userspace, not at the kernel level, because
UTF-8 allows "more", allows an hexadecimal encoding allowing all, and is
compatible with all utilities expecting strings.
There is no panacea. But UTF-8 is the most interesting solution, because
it allows existing, allows non existing, and does not cost a lot of
modifications to the existing code base.
--
Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Home |
Main Index |
Thread Index |
Old Index