Subject: Re: a proposal for two new libc functions: shquote() and shquotev()
To: Chris G. Demetriou <cgd@netbsd.org>
From: Jun-ichiro itojun Hagino <itojun@iijlab.net>
List: tech-userlevel
Date: 03/04/2001 09:43:07
>> what will be the behavior when someone is using non-C locale? will the
>> function use mbrtowc() to grab a letter (parse string as a multibyte
>> string like printf does), or is it just for C locale?
>That's an interesting question.
>How do different locales impact sh's parsing of arguments? What
>locales are allowed when specifying arguments?
there are localized shells, which allows Japanese text input on command
line for example (see freebsd ports/japanese/tcsh). so I guess there
are users/configurations that would need locale-ready parsing for
shquote. for example, if EUC-JP is used to encode Japanese text, we
would need to
- see 2 octets if we see 0x8e,
- see 3 octets if we see 0x8f,
- see 2 octets if 0x80 bit is set for the octet we are looking at,
- see 1 octet otherwise (ASCII)
mbrtowc() will effectively do this, in a locale-independent manner.
if we do not use mbrtowc() over localized string, we will make mistakes
because some of stateful encodings include "$" and "\" in multibyte
character streams (they are part of multibyte stream, so they should
not be escaped).
(caveat: i personally do not use those localized shells, at all)
itojun