tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: shell (/bin/sh) pattern matching bugs
[For reader, please refer to Robert Elz' whole enlightening answer. I
edit it]
On Sun, Jun 24, 2018 at 07:49:25PM +0700, Robert Elz wrote:
> | - [Suppression of the double quotes?
>
> This is, of course, the heart of the matter...
>
> In POSIX, quote removal is explicitly not done on case
> patterns. that is, the expansions that are done are listed,
> and quote removal is not one of them.
>
> So...
>
> | But this doesn't change anything in
> | the bracket expression];
>
> It would, as, assuming the current literal text, an input string
> which was a double quote (as in '"' or \") would match, as the
> double quote character would appear in the [ ] expression
> in the pattern.
>
> Of course that is clearly absurd, and a bug report on the posix
> text was submitted a while ago to include quote removal in the
> list of operations to preform on case patterns.
>
> Unfortunately, it isn't that simple, as just doing quote
> removal on patterns would cause
>
> case x in ("*") echo match;; esac
>
> to match as the quote removal would leave the
> pattern being just an asterisk, which matches anything,
> which is not what is supposed to happen.
>
> So the current proposed new text (which had been
> accepted, but now is being discussed again, and will
> be changed) also specified that along with quote removal,
> any "pattern magic" characters in the quoted part of the
> pattern would be \ escaped so they remained literal,
> so "quote removal" of the "*" would produce \* not *
> and so the pattern matching would look for a literal
> asterisk rather than anything - which is what is wanted.
Thanks for the explanations!
FWIW, as a POSIX shell user, I would expect something more intuitive
than what is proposed (if I understand correctly):
a) In all contexts, including the case patterns, substitutions including
quote removal are done;
b) _After that_, the patterns are interpreted according to their own
rules, including if double quotes escaped are still there, with string
of litterals.
That is:
var="[:alpha:]"
(["$var"]) would lead after a) to ([[:alpha:]]) and then '[' would not
match
while
([\"$var\"]) would lead after a) to (["[:alpha:]"]) and then
'"[:alpha:]"' being interpreted as a string of litterals, '[' would
match.
I think that POSIX shell users like me are used to the escaping dance
when they feed sed(1) in a shell with a (not shell) regular expression,
so it seems to me that this should be reasonably backward compatible
be the least surprise case.
Just my 2 cents.
Best regards.
PS: I don't know if you have already modified the sh(1) man page (I'm on
7.1.1 not on current), but I think that the case grammar should say that
the (pattern) expression is valid, the first '(' being optional---since
in all examples, and in the man page, there is always "pattern)", the
(pattern) expression can be surprising the "(...)" being used in some
shells for lists or arrays.
--
Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
http://www.kergis.com/
http://www.sbfa.fr/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Home |
Main Index |
Thread Index |
Old Index