tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
grep and options that print matches
Hi all,
while looking again at the BSD grep code, I stumbled over the
inconsistent behavior of GNU grep when there are overlapping matches.
If either --color or --only-matching are specified, it becomes important
to decide what part of a line matches and when using more than one
expression, in which order. To give a few examples to show this issue:
echo abcde | grep -o -e 'ab' -e 'cde'
This prints two lines, "ab" and "cde". This is expected behavior.
echo abcde | grep -o -e 'abc' -e 'cde'
This prints one line, "abc". IMO this is wrong -- the second pattern
certainly matches the input line and should get output.
echo abcdeabc | grep -o -e 'ab' -e 'cde'
This prints three lines. The newer grep versions justify this by a
change in the man page (-o prints each match on a separate line). It
doesn't exactly explain the order ("ab", "cde", "ab") though. I would
consider "ab", "ab", "cde" as output quite a bit more logical.
echo abc | grep -o -e '..'
This prints one line, "ab". This means the match is greedy, even though
it is documented nowhere.
echo abcd | grep -o -e '..' -e '.*'
This prints one line, "abcd". So the longest match wins.
echo abcd | grep -o -e '..' -e 'b.*'
...but only, if they start at the same place.
Color output uses the same rules.
To summarize, match selection for GNU grep is a earliest longest
match with no overlap. Now the important question: does that make sense
and do we want that behavior for BSD grep.
Joerg
Home |
Main Index |
Thread Index |
Old Index