Subject: Re: the state of regex(3)
To: Jason Thorpe <thorpej@shagadelic.org>
From: Alistair Crooks <agc@pkgsrc.org>
List: tech-userlevel
Date: 09/30/2004 21:37:58
On Wed, Sep 29, 2004 at 03:03:04PM -0700, Jason Thorpe wrote:
>
> On Sep 28, 2004, at 2:17 PM, Alistair Crooks wrote:
>
> >If Jason could help me out and tell me exactly what the sticking point
> >is, I'd be grateful.
>
> The sticking point is -- If we replace our regex with PCRE, then we can
> never pass a POSIX test suite if it happens to test the incompatible
> feature (which any comprehensive one should). I think that could be a
> major issue for some users of the system.
You are assuming that NetBSD could pass a POSIX test suite now.
It can't.
There are certain restrictions on POSIX regular expressions; one is
that the *USER* should keep their expressions below 256 characters in
length, to keep them portable - see re_format(7). Whilst NetBSD's
regex code can handle longer expressions, someone else's
POSIX-conformant code may not. One drawback of standards going for
the lowest common denominator. Are you going to add a warning message
to regcomp(3) for every regexp which is 256 chars or greater, just so
that POSIX-conformance is assured?
There is also POSIX_MISTAKE - take a look at src/lib/libc/regex/regcomp.c
#ifndef POSIX_MISTAKE
case ')': /* happens only if no current unmatched ( */
/*
* You may ask, why the ifndef? Because I didn't notice
* this until slightly too late for 1003.2, and none of the
* other 1003.2 regular-expression reviewers noticed it at
* all. So an unmatched ) is legal POSIX, at least until
* we can get it fixed.
*/
SETERROR(REG_EPAREN);
break;
#endif
Fine, let's conform to a standard that got it wrong.
And so back to PCRE - unfortunately, you deleted the section which
showed how to get REG_NEWLINE characteristics. Besides, these are
Perl-compatible regular expressions, which seem to be much more in
demand that POSIX ones.
I don't get the POSIX religious thing, especially when it's a flawed
standard.
Regards,
Alistair