Subject: Re: a proposal for two new libc functions: shquote() and shquotev()
To: Chris G. Demetriou <cgd@netbsd.org>
From: Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us>
List: tech-userlevel
Date: 03/04/2001 10:37:09
> (2) despite the additional difficulties, splitting is better than
> quoting, if you can reasonably demand that the values of the
> relevant environment variables (used for command names w/ possible
> options) be in ASCII, or
Splitting has the problem that you need to know a lot more about shell
syntax. (A "shsplit()" function would help, but you then get into the
neverending question of "how much shell do you need to implement").
> (3) you've gotta bite the bullet and do this multibyte...
>
> If (3), splitting probably better than quote-and-hand-to-/bin/sh,
> because /bin/sh isn't multibyte-char aware!
> Thoughts?
The point of shquote() (i.e., the "contract" with the programmer using
it) is to match the conventions of the shell used by popen() and
system(); if we get a multibyte-aware /bin/sh, shquote will need to be
multibyte aware. A shquote() portable among systems with and without
multibyte-aware shells will need to behave in a way which matches the
system its running on.
> if we do not use mbrtowc() over localized string, we will make mistakes
> because some of stateful encodings include "$" and "\" in multibyte
> character streams (they are part of multibyte stream, so they should
> not be escaped).
I'll note in passing that, on a -current system with mbrtowc() in
wchar.h, that there is no man page installed for mbrtowc() (and
possibly not for any of the other wchar.h functions); this makes it
difficult for someone unfamiliar with these API's to learn them on
NetBSD and start writing multibyte-aware code..
- Bill