Subject: Re: the state of regex(3)
To: Alistair Crooks <agc@pkgsrc.org>
From: Ian Lance Taylor <ian@airs.com>
List: tech-userlevel
Date: 09/30/2004 16:52:23
Alistair Crooks <agc@pkgsrc.org> writes:
> There are certain restrictions on POSIX regular expressions; one is
> that the *USER* should keep their expressions below 256 characters in
> length, to keep them portable - see re_format(7). Whilst NetBSD's
> regex code can handle longer expressions, someone else's
> POSIX-conformant code may not. One drawback of standards going for
> the lowest common denominator. Are you going to add a warning message
> to regcomp(3) for every regexp which is 256 chars or greater, just so
> that POSIX-conformance is assured?
I don't follow this--POSIX doesn't prohibit a POSIX conformant
implementation from supporting larger regexps, it merely requires that
a POSIX conformant application avoid using larger regexps. While one
could add a warning to regcomp to help the cause of writing a POSIX
conformant application, such a warning should be optional.
> There is also POSIX_MISTAKE - take a look at src/lib/libc/regex/regcomp.c
>
> #ifndef POSIX_MISTAKE
> case ')': /* happens only if no current unmatched ( */
> /*
> * You may ask, why the ifndef? Because I didn't notice
> * this until slightly too late for 1003.2, and none of the
> * other 1003.2 regular-expression reviewers noticed it at
> * all. So an unmatched ) is legal POSIX, at least until
> * we can get it fixed.
> */
> SETERROR(REG_EPAREN);
> break;
> #endif
>
> Fine, let's conform to a standard that got it wrong.
For what it's worth, the GNU approach is to check the environment
variable POSIXLY_CORRECT, and only strictly adhere to the standard
when that variable is defined. A somewhat similar case is ISO C
trigraphs--you don't normally want your compiler to implement
trigraphs, which basically mung your strings in weird ways, but
support is required for ISO C conformance; gcc only implements them
when the -ansi option is used.
> I don't get the POSIX religious thing, especially when it's a flawed
> standard.
I think that POSIX conformance, albeit user controlled, is desirable.
If nothing else, it permits writing highly portable application code.
And it is a selling point for NetBSD.
Ian