tech-userlevel: Re: a proposal for two new libc functions: shquote() and shquotev()

Subject: Re: a proposal for two new libc functions: shquote() and shquotev()
To: Chris G. Demetriou <cgd@netbsd.org>
From: Jun-ichiro itojun Hagino <itojun@iijlab.net>
List: tech-userlevel
Date: 03/04/2001 09:43:07

>> 	what will be the behavior when someone is using non-C locale?  will the
>> 	function use mbrtowc() to grab a letter (parse string as a multibyte
>> 	string like printf does), or is it just for C locale?
>That's an interesting question.
>How do different locales impact sh's parsing of arguments?  What
>locales are allowed when specifying arguments?

	there are localized shells, which allows Japanese text input on command
	line for example (see freebsd ports/japanese/tcsh).  so I guess there
	are users/configurations that would need locale-ready parsing for
	shquote.  for example, if EUC-JP is used to encode Japanese text, we
	would need to
	- see 2 octets if we see 0x8e,
	- see 3 octets if we see 0x8f,
	- see 2 octets if 0x80 bit is set for the octet we are looking at,
	- see 1 octet otherwise (ASCII)
	mbrtowc() will effectively do this, in a locale-independent manner.

	if we do not use mbrtowc() over localized string, we will make mistakes
	because some of stateful encodings include "$" and "\" in multibyte
	character streams (they are part of multibyte stream, so they should 
	not be escaped).

	(caveat: i personally do not use those localized shells, at all)

itojun