On Mon, Dec 03, 2012 at 04:06:06AM +0100, Alistair Crooks wrote: > On Sun, Dec 02, 2012 at 07:35:51PM -0500, Thomas Dickey wrote: > > On Mon, Dec 03, 2012 at 12:16:57AM +0100, Alistair Crooks wrote: > > > > > > It's long been a pet peeve of mine that regexp matching for word > > > boundaries has an annoying dependency on the implementation. > > > > > > My thanks to the many people hwo helped out with this table, reproduced > > > below, which shows what works, and what doesn't work, when attempting > > > to match the zero-width pattern at a word boundary. > > > > \b is a perl feature > > > > man perlre explains about that, and \< > > (BRE's versus ERE's, essentially). > > Let's look at what happens: > > vile-9.8nb1 on NetBSD/amd64 6.99.10, vile /usr/share/dict/words > /\<arch > cursor is placed at the start of the word "arch" on line 12397, as I > would expect. > > vile-9.8e on FreeBSD/amd64 9.0-RC1 (I know, I know), vile > /usr/share/dict/words > /\<arch > cursor is placed on the "arch" in "agonistarch" on line 4109 > i.e. \< as a word boundary is not respected. > /\barch results in a "not found" > > now another try: hmm - 9.8j is current. I don't recall any recent regex changes or related fixes. I have 9.8i on FreeBSD 9, and don't see this behavior. I'll make a to-do to investigate the port's configuration... \b in vile means almost the same as \s \b is [[:blank:]] \s is [[:space:]] so \s includes \r, \n and \f while \b does not. (\b has been there since early 2001 - seems that I added it as part of the character-class changes). > vile-9.8e on FreeBSD/amd64 9.0-RC1, vile /etc/motd > The text reads > ... > Welcome to FreeBSD! > ... > /\<to > Results in "not found" > so now let's use the one derived from perl regexps > /\bto > and the word "to" is found. > (unfortunately, the cursor is placed on the space before the word "to". > So, it's not quite zero-width, and some people may find that close > enough. > Again, unfortunately, I'm not one of them). > > not quite what i'd expect from RTFM, but thanks for the suggestion. > > > > regexp word boundaries > > > \< \b [[:<:]] > > > perl not works not > > > > (see manpage, as noted) > > I think it would probably be best if you viewed what I wrote as a general > criticism that is the trainwreck of regexp word boundary matching, rather > than pointing me at a manual page for one of the programs involved. sure - but reading it, I look for things to improve (or fix). \< should work because it's the most standardized. \b is perl (perl will never be standardized - so I've read :-) [[:<:]] is... (let's not digress) I'd forgotten about \b in vile actually, but given compatibility issues I could change its behavior to more closely match perl's (and added it to my to-do list to investigate). Doing that would lose the nice feature that all of the character classes have an abbreviation. > > > freebsd vile not works not > > > netbsd vile works not not > > > > without version numbers, I can only guess what you're referring to with > > vile. \< has been part of vile for a long time; \b is different from perl > > (vile matches whitespace rather than a word boundary). Both are in the > > help-file. See > > > > http://invisible-island.net/vile/vile-toc.html > > http://invisible-island.net/vile/vile-hlp.html#regular-expressions2 > > Thanks - I remember fixing the \< zero-width matching in the mid 1990s > on vile, and Paul merged the fix. Unfortunately, your change log only > goes back as far as 1999 when the license was changed to the GPL (and > when I stopped working on vile), so there's no record of anything going > back that far. All of the changelogs are in the sources - the practice used to be that we would rename CHANGES to CHANGES.Rx, but I stopped doing that a while back (filesizes aren't as important). -- Thomas E. Dickey http://invisible-island.net ftp://invisible-island.net
Attachment:
pgpDdv04NS3Mc.pgp
Description: PGP signature