Subject: Re: shell quoting problems
To: David Laight <david@l8s.co.uk>
From: Greg A. Woods <woods@weird.com>
List: tech-userlevel
Date: 11/26/2002 14:15:46
[ On Tuesday, November 26, 2002 at 18:11:18 (+0000), David Laight wrote: ]
> Subject: shell quoting problems
>
> Under 'command substitution' is:
>
> "A single-quoted or double-quoted string that begins, but does not
> end, within the "`...`" sequence produces undefined results."
>
> From which one could infer that in "`... "abc" ...`" the characters
> abc are a quoted string, rather than being outside the ones that
> contain the ` characters.
Well, yes, sort of..... My understanding is that they are a quoted
string for the command expressed between the back-quotes. The outer
double quotes are what make the output of the command into a quoted
string itself.
There is mention of some bogus back-quote handling of some older
implementations in the rationale section for quoting in IEEE P1003.2
Draft 11.2:
Some systems have allowed the end of the word to terminate the backquoted
command substitution, such as in
"`echo hello"
This usage is undefined in POSIX.2, where the matching backquote is
required. The other undefined usage can be illustrated by the example:
sh -c '` echo "foo`'
The description of the recursive actions involving command substitution
can be illustrated with an example. Upon recognizing the introduction of
command substitution, the shell must parse input (in a new context),
gathering the ``source'' for the command substitution until an unbalanced
) or ` is located. For example, in the following
echo "$(date; echo "
one" )"
the double-quote following the echo does not terminate the first double-
quote; it is part of the command substitution ``script.'' Similarly, in
echo "$(echo *)"
the asterisk is not quoted since it is inside command substitution;
however,
echo "$(echo "*")"
is quoted (and represents the asterisk character itself).
Here is the text from IEEE Std. 1003.1-2001, as presented in SuSv3,
which describes the shell command substitution rules:
Command Substitution
Command substitution allows the output of a command to be substituted
in place of the command name itself. Command substitution shall occur
when the command is enclosed as follows:
$(command)
or (backquoted version):
`command`
The shell shall expand the command substitution by executing command
in a subshell environment (see [102]Shell Execution Environment ) and
replacing the command substitution (the text of command plus the
enclosing "$()" or backquotes) with the standard output of the
command, removing sequences of one or more <newline>s at the end of
the substitution. Embedded <newline>s before the end of the output
shall not be removed; however, they may be treated as field delimiters
and eliminated during field splitting, depending on the value of IFS
and quoting that is in effect.
Within the backquoted style of command substitution, backslash shall
retain its literal meaning, except when followed by: '$' , '`' , or
'\' (dollar sign, backquote, backslash). The search for the matching
backquote shall be satisfied by the first backquote found without a
preceding backslash; during this search, if a non-escaped backquote is
encountered within a shell comment, a here-document, an embedded
command substitution of the $( command) form, or a quoted string,
undefined results occur. A single-quoted or double-quoted string that
begins, but does not end, within the "`...`" sequence produces
undefined results.
With the $( command) form, all characters following the open
parenthesis to the matching closing parenthesis constitute the
command. Any valid shell script can be used for command, except a
script consisting solely of redirections which produces unspecified
results.
The results of command substitution shall not be processed for further
tilde expansion, parameter expansion, command substitution, or
arithmetic expansion. If a command substitution occurs inside
double-quotes, it shall not be performed on the results of the
substitution.
Command substitution can be nested. To specify nesting within the
backquoted version, the application shall precede the inner backquotes
with backslashes, for example:
\`command\`
If the command substitution consists of a single subshell, such as:
$( (command) )
a conforming application shall separate the "$(" and '(' into two
tokens (that is, separate them with white space). This is required to
avoid any ambiguities with arithmetic expansion.
As for shell quoting, well, here is the authoritative text from IEEE
Std. 1003.1-2001, as presented in SuSv3 (which you should download for
yourself):
Quoting
Quoting is used to remove the special meaning of certain characters or
words to the shell. Quoting can be used to preserve the literal
meaning of the special characters in the next paragraph, prevent
reserved words from being recognized as such, and prevent parameter
expansion and command substitution within here-document processing
(see [16]Here-Document ).
The application shall quote the following characters if they are to
represent themselves:
| & ; < > ( ) $ ` \ " ' <space> <tab> <newline>
and the following may need to be quoted under certain circumstances.
That is, these characters may be special depending on conditions
described elsewhere in this volume of IEEE Std 1003.1-2001:
* ? [ # ~ = %
The various quoting mechanisms are the escape character,
single-quotes, and double-quotes. The here-document represents another
form of quoting; see [17]Here-Document .
Escape Character (Backslash)
A backslash that is not quoted shall preserve the literal value of the
following character, with the exception of a <newline>. If a <newline>
follows the backslash, the shell shall interpret this as line
continuation. The backslash and <newline>s shall be removed before
splitting the input into tokens. Since the escaped <newline> is
removed entirely from the input and is not replaced by any white
space, it cannot serve as a token separator.
Single-Quotes
Enclosing characters in single-quotes ( '' ) shall preserve the
literal value of each character within the single-quotes. A
single-quote cannot occur within single-quotes.
Double-Quotes
Enclosing characters in double-quotes ( "" ) shall preserve the
literal value of all characters within the double-quotes, with the
exception of the characters dollar sign, backquote, and backslash, as
follows:
$
The dollar sign shall retain its special meaning introducing
parameter expansion (see [18]Parameter Expansion ), a form of
command substitution (see [19]Command Substitution ), and
arithmetic expansion (see [20]Arithmetic Expansion ).
The input characters within the quoted string that are also
enclosed between "$(" and the matching ')' shall not be
affected by the double-quotes, but rather shall define that
command whose output replaces the "$(...)" when the word is
expanded. The tokenizing rules in [21]Token Recognition , not
including the alias substitutions in [22]Alias Substitution ,
shall be applied recursively to find the matching ')' .
Within the string of characters from an enclosed "${" to the
matching '}' , an even number of unescaped double-quotes or
single-quotes, if any, shall occur. A preceding backslash
character shall be used to escape a literal '{' or '}' . The
rule in [23]Parameter Expansion shall be used to determine the
matching '}' .
`
The backquote shall retain its special meaning introducing the
other form of command substitution (see [24]Command
Substitution ). The portion of the quoted string from the
initial backquote and the characters up to the next backquote
that is not preceded by a backslash, having escape characters
removed, defines that command whose output replaces "`...`"
when the word is expanded. Either of the following cases
produces undefined results:
+ A single-quoted or double-quoted string that begins, but does
not end, within the "`...`" sequence
+ A "`...`" sequence that begins, but does not end, within the
same double-quoted string
\
The backslash shall retain its special meaning as an escape
character (see [25]Escape Character (Backslash) ) only when
followed by one of the following characters when considered
special:
$ ` " \ <newline>
The application shall ensure that a double-quote is preceded by a
backslash to be included within double-quotes. The parameter '@' has
special meaning inside double-quotes and is described in [26]Special
Parameters .
Token Recognition
The shell shall read its input in terms of lines from a file, from a
terminal in the case of an interactive shell, or from a string in the
case of [27]sh -c or [28]system(). The input lines can be of unlimited
length. These lines shall be parsed using two major modes: ordinary
token recognition and processing of here-documents.
When an io_here token has been recognized by the grammar (see
[29]Shell Grammar ), one or more of the subsequent lines immediately
following the next NEWLINE token form the body of one or more
here-documents and shall be parsed according to the rules of
[30]Here-Document .
When it is not processing an io_here, the shell shall break its input
into tokens by applying the first applicable rule below to the next
character in its input. The token shall be from the current position
in the input until a token is delimited according to one of the rules
below; the characters forming the token are exactly those in the
input, including any quoting characters. If it is indicated that a
token is delimited, and no characters have been included in a token,
processing shall continue until an actual token is delimited.
1. If the end of input is recognized, the current token shall be
delimited. If there is no current token, the end-of-input
indicator shall be returned as the token.
2. If the previous character was used as part of an operator and the
current character is not quoted and can be used with the current
characters to form an operator, it shall be used as part of that
(operator) token.
3. If the previous character was used as part of an operator and the
current character cannot be used with the current characters to
form an operator, the operator containing the previous character
shall be delimited.
4. If the current character is backslash, single-quote, or
double-quote ( '\' , '" , or ' )' and it is not quoted, it shall
affect quoting for subsequent characters up to the end of the
quoted text. The rules for quoting are as described in [31]Quoting
. During token recognition no substitutions shall be actually
performed, and the result token shall contain exactly the
characters that appear in the input (except for <newline>
joining), unmodified, including any embedded or enclosing quotes
or substitution operators, between the quote mark and the end of
the quoted text. The token shall not be delimited by the end of
the quoted field.
5. If the current character is an unquoted '$' or '`' , the shell
shall identify the start of any candidates for parameter expansion
( [32]Parameter Expansion ), command substitution ( [33]Command
Substitution ), or arithmetic expansion ( [34]Arithmetic Expansion
) from their introductory unquoted character sequences: '$' or
"${" , "$(" or '`' , and "$((" , respectively. The shell shall
read sufficient input to determine the end of the unit to be
expanded (as explained in the cited sections). While processing
the characters, if instances of expansions or quoting are found
nested within the substitution, the shell shall recursively
process them in the manner specified for the construct that is
found. The characters found from the beginning of the substitution
to its end, allowing for any recursion necessary to recognize
embedded constructs, shall be included unmodified in the result
token, including any embedded or enclosing substitution operators
or quotes. The token shall not be delimited by the end of the
substitution.
6. If the current character is not quoted and can be used as the
first character of a new operator, the current token (if any)
shall be delimited. The current character shall be used as the
beginning of the next (operator) token.
7. If the current character is an unquoted <newline>, the current
token shall be delimited.
8. If the current character is an unquoted <blank>, any token
containing the previous character is delimited and the current
character shall be discarded.
9. If the previous character was part of a word, the current
character shall be appended to that word.
10. If the current character is a '#' , it and all subsequent
characters up to, but excluding, the next <newline> shall be
discarded as a comment. The <newline> that ends the line is not
considered part of the comment.
11. The current character is used as the start of a new word.
Once a token is delimited, it is categorized as required by the
grammar in [35]Shell Grammar .
IMNSHO the back-quote form of command-substitution should have been
deprecated LONG ago!
--
Greg A. Woods
+1 416 218-0098; <g.a.woods@ieee.org>; <woods@robohack.ca>
Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>