tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: bin/39002: harmful AWK extension: non-portable escaped character
On Thu, Jun 26, 2008 at 10:52:57AM -0400, Greg A. Woods; Planix, Inc. wrote:
>>> It's not clearly defined there at all.
>>
>> ...which is why it ought to generate a warning.
>
> No, I don't think so.
>
> As others have shown the standard (POSIX) does not define the behaviour of
> backslashes in a string constant in AWK.
>
> However as I've shown the history of AWK in the context of UNIX not only
> clearly defines the purpose and meaning of backslashes in a string constant
> (and separately in regular expressions), but a rationale is also plainly
> evident for the way these things have always worked the way they do in all
> but one(?) "rogue"(*) implementation.
I'm not clear on what rationale you're thinking of. If someone writes
the string constant "^.*\.txt$", it's evident upon inspection by a
human that they intended the \. to escape the regexp metacharacter,
that is, they meant to write "^.*\\.txt$".
This is doubtless why mawk does what it reportedly does, but as you
note it's not what all the other implementations do.
What I don't understand is why you think it's desirable to assume the
opposite meaning, which is clearly not what anyone intends or wants.
> Adding a warning, especially in the way that was proposed (IIUC), will
> potentially make many valid scripts, including existing scripts, spew
> unnecessary warnings.
"Valid" in the sense that demons flying out of your nose is valid. The
behavior is undefined. Warning about undefined behavior is a good
thing.
> Perhaps if the warning were made significantly more intelligent then its
> warnings might prove to be useful, but only in that case. I would strongly
> suggest that warnings MUST NOT be given for properly escaped regular
> expressions which are expressed as string constants.
This paragraph does not make any sense.
> (*)
> [ranting about gawk deleted]
>
> Perhaps what would be more fruitful would be
> for someone to propose and implement a fix for gawk to prevent it from
> recognizing C special character escapes in regular expressions and for it
> to treat string constants as pure C-like strings and thus hopefully
> eventually eliminate this glaring difference between gawk and most/all
> other AWK implementations.
This does not make any sense either.
--
David A. Holland
dholland%netbsd.org@localhost
Home |
Main Index |
Thread Index |
Old Index