Subject: GNU metacharacter support for bsdgrep-devel
To: None <tech-userlevel@netbsd.org>
From: Bruce J.A. Nourish <bjan+tech-userlevel@bjan.net>
List: tech-userlevel
Date: 01/20/2004 01:23:53
Hey everyone,

Breaking my long tradition of merely slagging off other people's work, 
rather than doing any myself, I give you support for GNU grep's 
metacharacters in bgrep. I hope this brings the prospect of a free grep
in the NetBSD base system a step closer to reality. Specifically,
we now grok:

\< and \> - word begin and word end
\w (\W)   - (non-) word characters
\b (\B)   - (non-) word begin or end

This is done by preprocessing the pattern before it gets regcomp(3)'d.

Note that I am a novice C programmer and you should review the patch for
correctness before you do anything crazy, like applying it. I diffed 
against the *unpatched* source that results from doing "make extract" in 
textproc/bsdgrep-devel.

--- grep.c.orig	2003-11-08 13:10:08.000000000 -0700
+++ grep.c
@@ -37,6 +37,7 @@ __RCSID("$NetBSD: grep.c,v 1.43 2003/11/
 #include <sys/types.h>
 #include <sys/stat.h>
 
+#include <zlib.h>
 #include <err.h>
 #include <errno.h>
 #include <getopt.h>
@@ -184,6 +185,49 @@ struct option long_options[] =
 	{NULL,                  no_argument,       NULL, 0}
 };
 
+static char *
+expand(char *pat, size_t *len)
+{
+	char *tmp, *rep;
+	size_t prelen, replen, postlen;
+
+	tmp = pat;
+	while((tmp = strchr(tmp, '\\')) != NULL) {
+		switch(*++tmp) {
+		case '<':
+			rep = "[[:<:]]";
+			break;
+		case '>':
+			rep = "[[:>:]]";
+			break;
+		case 'w':
+			rep = "[[:alnum:]]";
+			break;
+		case 'W':
+			rep = "[^[:alnum:]]";
+			break;
+		case 'b':
+			rep = "[[:<:][:>:]]";
+			break;
+		case 'B':
+			rep = "[^[:<:][:>:]]";
+			break;
+		default:
+			continue;
+		}
+		replen = strlen(rep);
+		prelen = tmp - pat - 1;
+		postlen = *len - prelen - 2;
+		pat = grep_realloc(pat, *len + replen);
+		tmp = pat + prelen;
+		memmove(tmp + replen, tmp + 2, postlen);
+		memcpy(tmp, rep, replen);
+		*len = prelen + replen + postlen;
+		tmp += replen;
+	}
+	return pat;
+}
+
 static void
 add_pattern(char *pat, size_t len)
 {
@@ -200,6 +244,8 @@ add_pattern(char *pat, size_t len)
 	pattern[patterns] = grep_malloc(len + 1);
 	strncpy(pattern[patterns], pat, len);
 	pattern[patterns][len] = '\0';
+	if (!Fflag)
+		pattern[patterns] = expand(pattern[patterns], &len);
 	++patterns;
 }
 
-- 
Bruce J.A. Nourish <bjan+public@bjan.net> http://bjan.freeshell.org