Subject: how to write secure internationalized shell scripts
To: None <tech-security@netbsd.org>
From: Bruno Haible <bruno@clisp.org>
List: tech-security
Date: 09/04/2003 17:55:09
Hi all,
Could some of you please tell me whether the proposed methodology for using
internationalization in shell scripts, based on GNU gettext, is safe enough?
The proposal for a hello-world program (that I want to incorporate in the
GNU gettext manual) looks like this:
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
#! /bin/sh
# Find a way to echo strings without interpreting backslash.
if test "X`(echo '\t') 2>/dev/null`" =3D 'X\t'; then
echo=3D'echo'
else
if test "X`(printf '%s\n' '\t') 2>/dev/null`" =3D 'X\t'; then
echo=3D'printf %s\n'
else
echo_func () {
cat <<EOT
$*
EOT
}
echo=3D'echo_func'
fi
fi
TEXTDOMAIN=3Dhello
export TEXTDOMAIN
TEXTDOMAINDIR=3D/absolute/path/to/localedir
export TEXTDOMAINDIR
# Test whether the locale encoding is good or weird.
locale_charset_weird () {
case `locale charmap | tr a-z A-Z` in
BIG5 | BIG5-HKSCS | GBK | GB18030 | SHIFT_JIS | JOHAB) (exit 0);;
*) (exit 1);;
esac
}
use_backquote_workaround=3D
if locale_charset_weird; then
s=3D`echo '()echo ()' | LC_ALL=3DC tr '(' '\340' | LC_ALL=3DC tr ')' '\14=
0'`
if eval echo "$s" | grep echo > /dev/null; then
: # OK, the shell can recognize multibyte characters correctly.
else
# The shell can mistakenly interpret double-byte characters like \xe0\x=
60.
use_backquote_workaround=3Dyes
fi
fi
if test -n "$use_backquote_workaround"; then
eval_gettext () {
_string=3D`gettext "$1" | LC_ALL=3DC tr -d '\177' | LC_ALL=3DC tr '\140=
' '\177'`
eval _string=3D"\"$_string\""
$echo "$_string" | LC_ALL=3DC tr '\177' '\140'
}
eval_ngettext () {
_string=3D`ngettext "$1" "$2" "$3" | LC_ALL=3DC tr -d '\177' | LC_ALL=
=3DC tr '\140' '\177'`
eval _string=3D"\"$_string\""
$echo "$_string" | LC_ALL=3DC tr '\177' '\140'
}
else
eval_gettext () {
_string=3D`gettext "$1"`
eval _string=3D"\"$_string\""
$echo "$_string"
}
eval_ngettext () {
_string=3D`ngettext "$1" "$2" "$3"`
eval _string=3D"\"$_string\""
$echo "$_string"
}
fi
# gettext can be used with literal strings without variables.
$echo "`gettext "Hello world"`"
# eval_gettext is for the cases where the string refers to variables.
$echo "`eval_gettext "Hello Mr. \\$USER, your terminal type is \\$TERM."`"
# eval_ngettext is for plural forms.
$echo "`eval_ngettext "a piece of cake" "\\$n pieces of cake" $n`"
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
The idea is that a message catalog created by the translator contains, say,
#, sh-evaluated
msgid "Hello world"
msgstr "Hallo Welt"
#, sh-evaluated sh-format
msgid "Hello Mr. $USER, your terminal type is $TERM."
msgstr "Hallo Herr $USER, Ihr Terminal ist ein $TERM."
#, sh-evaluated sh-format
msgid "a piece of cake"
msgid_plural "$n pieces of cake"
msgstr[0] "ein St=C5=B1ck Kuchen"
msgstr[1] "$n St=C5=B1ck Kuchen"
Such a message catalog is transformed to a .mo file by the msgfmt program.
The 'sh-format' marker is used by msgfmt: "msgfmt -c" verifies that the
translation (msgstr) refers only to those variables that the original string
(msgid) already refers to. The 'sh-evaluated' marker is used by msgfmt
as well: "msgfmt -c" verifies that the translation does not use dangerous
constructs like `...` or $(...).
The 'gettext' and 'ngettext' programs access this .mo file to extract
the translations and convert them to the current locale's encoding. Then
the shell script functions 'eval_gettext' or 'eval_ngettext' evaluate
the resulting string, to get the variables' values substituted into it.
Can you see security problems associated with this methodology?
Bruno