Subject: Re: AWK vs. gawk.
To: None <netbsd-help@netbsd.org, current-users@netbsd.org>
From: Richard Rauch <rkr@olib.org>
List: current-users
Date: 05/12/2004 05:18:55
On Tue, May 11, 2004 at 05:30:50AM -0500, Richard Rauch wrote:
> Right around the time that NetBSD -current switched from using gawk to
> using nawk (I think) as the system AWK, I had written a smallish
> AWK script to parse Doxygen documentation and spit out man pages.
> (Doxygen can generate *roff, but it's not really usable for man
[...]
> I am particularly bothered by a missing feature from the NetBSD
> AWK, in the regular expressions: I need to anchor some of my
> matches to beginnings or ends of strings. For example, one of
> my gensub calls is:
>
> ret = gensub ("^[ ]*\\([ ]*", "", "g", ret );
Okay, the above is perfectly correct with gawk and with NetBSD's
awk, it seems.
The problem is that those weren't the lines giving me problems.
I didn't test carefully with a ~2.0 system. Or rather, the
problem lines weren't *exactly* like the above.
The problem is when "g" is not "g", but is rather 1 or "1".
(Or any other integer.)
With gawk, for any integer n:
ret = gensub (<r.e.>, <replace>, n, src);
...you get the nth occurance replaced, where n=1 is the left-
most occurance.
With NetBSD's AWK, you get the (n+1)th, so n=0 is the left-most
occurance.
Concretely:
echo "helloello" | awk '{print gensub ("ello", "i", 1, $0);}'
...prints "hiello" on gawk, and "helloi" on NetBSD's pre-2.0
-current (and presuambly 2.0) awk.
Since I understand gensub() to be a GNUism, and cannot find any
outstanding bug report on this, I'm going to file a bug report
after I send this message. I'm CC'ing to current-users since
I assume that it will be of particular interest there. (Although
I'm running -current, it's a pre-2.0 -current, which is why I
originally posted to this list.)
--
"I probably don't know what I'm talking about." http://www.olib.org/~rkr/