tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: sed(1) / BRE bug?
Hello Robert,
On Mon, Oct 12, 2020 at 12:06:57AM +0700, Robert Elz wrote:
> Date: Sun, 11 Oct 2020 11:45:12 +0200
> From: tlaronde%polynum.com@localhost
> Message-ID: <20201011094512.GA356%polynum.com@localhost>
>
>
> | The problem? the leading '$' is not escaped (I was trying to get the var
> | name from a Makefile)...
> |
> | Is this a bug or is this behavior undefined or even required by
> | POSIX?
>
> Not a bug, and (kind of) required, kind of in that a \ somewhere it
> is not required produces undefined results (XBD 9.3.2)
>
> The interpretation of an ordinary character preceded by an
> unescaped <backslash> ('\\') is undefined, except for:
>
> The exceptions have nothing to do with '$'.
>
> "Ordinary character" is defined in the previous sentence, same section
>
> An ordinary character is a BRE that matches itself: any character
> in the supported character set, except for the BRE special
> characters listed in Section 9.3.3.
>
> 9.3.3 does include '$' but:
>
> $ The <dollar-sign> shall be special when used as an anchor.
>
> So, '$' is an ordinary character, except when it is an anchor.
>
> Anchors are defined in XBD 9.3.8:
>
> A BRE can be limited to matching expressions that begin or end
> a string; this is called ``anchoring''. The <circumflex> and
> <dollar-sign> special characters shall be considered BRE
> anchors in the following contexts:
>
> (skip '^' for this message)
>
> 2. A <dollar-sign> ('$') shall be an anchor when used as the last
> character of an entire BRE.
>
> Your (first) '$' was not the last character of the BRE, so is not an anchor,
> and hence is not special for this reason. The section continues:
>
> The implementation may treat a <dollar-sign> as an anchor when
> used as the last character of a subexpression.
>
> That one is optional for the implementation so you could not rely upon
> it working, but here your first '$' is not at the end of a subsxpression,
> so it wouldn't qualify anyway. [This option is actually very ugly, as
> when you want to use a '$' at the end of a subexpression to match itself,
> rather than be an anchor, you must escape it with '\' if the implementation
> would treat it as an anchor, but not escape it if it wouldn't.]
>
> The rest of the paragraph just explains how matching by a '$' that is
> an anchor works. Yours isn't, so is just an ordinary character, and
> so matches itself, and would produce undefined results if escaped (the
> undefined result could be for it to simply match itself, making \$ always
> mean to match a literal '$' but a sed (or anything else using BREs) script
> should not rely upon that).
>
> This behaviour ('^' is special only when it is the very first character
> of the RE, and '$' is special only when it is the absolute last) is traditional
> RE behaviour going back to the very earliest unix RE's (as in "ed").
>
> For ERE's the rules are slightly different, but for anchors, I think only in
> that they always work in subexpressions, it isn't an implementation option.
> So, even in an ERE your first '$' should not be escaped (and certainly does
> not require to be).
>
Thank you for the information! (I would not have been able to weave my
way thru the standard to finally find this answer.)
I will have to verify that on every system, the unescaped leading dollar
does not cause something nasty (it is used for installation on every
system where I try to have the minimum POSIX.2 requirements.).
Best regards,
--
Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
http://www.kergis.com/
http://www.sbfa.fr/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C
Home |
Main Index |
Thread Index |
Old Index