Source-Changes-HG archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
[src/trunk]: src/external/bsd/tre/dist Import tre from https://github.com/lau...
details: https://anonhg.NetBSD.org/src/rev/27293283f233
branches: trunk
changeset: 357604:27293283f233
user: rin <rin%NetBSD.org@localhost>
date: Fri Nov 17 16:11:11 2017 +0000
description:
Import tre from https://github.com/laurikari/tre as of 10171117:
- tre_reg*b() functions are added, that take bytes literally.
- minor bug fixes
diffstat:
external/bsd/tre/dist/Makefile.am | 6 +-
external/bsd/tre/dist/README.md | 290 +++++++++++
external/bsd/tre/dist/include/tre/tre-config.h | 52 ++
external/bsd/tre/dist/include/tre/tre.h | 276 +++++++++++
external/bsd/tre/dist/lib/Makefile.am | 1 -
external/bsd/tre/dist/lib/tre-filter.c | 73 ++
external/bsd/tre/dist/lib/tre-filter.h | 19 +
external/bsd/tre/dist/po/fi.po | 3 +-
external/bsd/tre/dist/po/sv.po | 3 +-
external/bsd/tre/dist/python/setup.py | 4 +-
external/bsd/tre/dist/python/setup.py.in | 39 +
external/bsd/tre/dist/python/tre-python.c | 86 ++-
external/bsd/tre/dist/src/Makefile.am | 2 +-
external/bsd/tre/dist/tests/Makefile.am | 3 +-
external/bsd/tre/dist/tests/build-hosts/ahma | 1 +
external/bsd/tre/dist/tests/build-hosts/earthquake | 1 +
external/bsd/tre/dist/tests/build-hosts/hemuli | 1 +
external/bsd/tre/dist/tests/build-hosts/jolly | 14 +
external/bsd/tre/dist/tests/build-on-hosts.sh | 49 +
external/bsd/tre/dist/tests/build-run.sh | 27 +
external/bsd/tre/dist/utils/Makefile.am | 3 +-
external/bsd/tre/dist/utils/autogen.sh | 12 +-
external/bsd/tre/dist/utils/build-release.sh | 16 +
external/bsd/tre/dist/utils/build-sources.sh | 46 +
external/bsd/tre/dist/utils/replace-vars.sh | 30 +
external/bsd/tre/dist/vcbuild/tre.vcxproj | 98 +++
external/bsd/tre/dist/vcbuild/tre.vcxproj.filters | 89 +++
external/bsd/tre/dist/win32/retest.vcproj | 232 +++++++++
external/bsd/tre/dist/win32/tre-config.h.in | 52 ++
external/bsd/tre/dist/win32/tre.sln | 29 +
external/bsd/tre/dist/win32/tre.vcproj | 516 +++++++++++++++++++++
31 files changed, 2037 insertions(+), 36 deletions(-)
diffs (truncated from 2321 to 300 lines):
diff -r 09c6270f1874 -r 27293283f233 external/bsd/tre/dist/Makefile.am
--- a/external/bsd/tre/dist/Makefile.am Fri Nov 17 16:08:20 2017 +0000
+++ b/external/bsd/tre/dist/Makefile.am Fri Nov 17 16:11:11 2017 +0000
@@ -11,9 +11,9 @@
EXTRA_DIST = \
LICENSE \
win32/tre-config.h win32/config.h \
- win32/tre.dsw \
- win32/tre.dsp win32/tre.def \
- win32/retest.dsp \
+ win32/tre.vcproj \
+ win32/tre.sln \
+ win32/retest.vcproj \
python/tre-python.c \
python/setup.py \
python/example.py
diff -r 09c6270f1874 -r 27293283f233 external/bsd/tre/dist/README.md
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/external/bsd/tre/dist/README.md Fri Nov 17 16:11:11 2017 +0000
@@ -0,0 +1,290 @@
+Introduction
+============
+
+TRE is a lightweight, robust, and efficient POSIX compliant regexp
+matching library with some exciting features such as approximate
+(fuzzy) matching.
+
+The matching algorithm used in TRE uses linear worst-case time in
+the length of the text being searched, and quadratic worst-case
+time in the length of the used regular expression.
+
+In other words, the time complexity of the algorithm is O(M^2N), where
+M is the length of the regular expression and N is the length of the
+text. The used space is also quadratic on the length of the regex, but
+does not depend on the searched string. This quadratic behaviour
+occurs only on pathological cases which are probably very rare in
+practice.
+
+
+Hacking
+=======
+
+Here's how to work with this code.
+
+Prerequisites
+-------------
+
+You will need the following tools installed on your system:
+
+ - autoconf
+ - automake
+ - gettext
+ - libtool
+ - zip (optional)
+
+
+Building
+--------
+
+First, prepare the tre. Change to the root of the source directory
+and run
+```
+./utils/autogen.sh
+```
+This will regenerate various things using the prerequisite tools so
+that you end up with a buildable tree.
+
+After this, you can run the configure script and build TRE as usual:
+```
+./configure
+make
+make check
+make install
+```
+
+
+Building a source code package
+------------------------------
+
+In a prepared tree, this command creates a source code tarball:
+```
+./configure && make dist
+```
+
+Alternatively, you can run
+```
+./utils/build-sources.sh
+```
+which builds the source code packages and puts them in the `dist`
+subdirectory. This script needs a working `zip` command.
+
+
+Features
+========
+
+TRE is not just yet another regexp matcher. TRE has some features
+which are not there in most free POSIX compatible implementations.
+Most of these features are not present in non-free implementations
+either, for that matter.
+
+Approximate matching
+--------------------
+
+Approximate pattern matching allows matches to be approximate, that
+is, allows the matches to be close to the searched pattern under
+some measure of closeness. TRE uses the edit-distance measure (also
+known as the Levenshtein distance) where characters can be
+inserted, deleted, or substituted in the searched text in order to
+get an exact match.
+
+Each insertion, deletion, or substitution adds the distance, or cost,
+of the match. TRE can report the matches which have a cost lower than
+some given threshold value. TRE can also be used to search for matches
+with the lowest cost.
+
+TRE includes a version of the agrep (approximate grep) command line
+tool for approximate regexp matching in the style of grep. Unlike
+other agrep implementations (like the one by Sun Wu and Udi Manber
+from University of Arizona) TRE agrep allows full regexps of any
+length, any number of errors, and non-uniform costs for insertion,
+deletion and substitution.
+
+Strict standard conformance
+---------------------------
+
+POSIX defines the behaviour of regexp functions precisely. TRE
+attempts to conform to these specifications as strictly as possible.
+TRE always returns the correct matches for subpatterns, for example.
+Very few other implementations do this correctly. In fact, the only
+other implementations besides TRE that I am aware of (free or not)
+that get it right are Rx by Tom Lord, Regex++ by John Maddock, and the
+AT&T ast regex by Glenn Fowler and Doug McIlroy.
+
+The standard TRE tries to conform to is the IEEE Std 1003.1-2001,
+or Open Group Base Specifications Issue 6, commonly referred to as
+"POSIX". It can be found online here. The relevant parts are the
+base specifications on regular expressions (and the rationale) and
+the description of the regcomp() API.
+
+For an excellent survey on POSIX regexp matchers, see the testregex
+pages by Glenn Fowler of AT&T Labs Research.
+
+Predictable matching speed
+--------------------------
+
+Because of the matching algorithm used in TRE, the maximum time
+consumed by any regexec() call is always directly proportional to
+the length of the searched string. There is one exception: if back
+references are used, the matching may take time that grows
+exponentially with the length of the string. This is because
+matching back references is an NP complete problem, and almost
+certainly requires exponential time to match in the worst case.
+
+Predictable and modest memory consumption
+-----------------------------------------
+
+A regexec() call never allocates memory from the heap. TRE
+allocates all the memory it needs during a regcomp() call, and some
+temporary working space from the stack frame for the duration of
+the regexec() call. The amount of temporary space needed is
+constant during matching and does not depend on the searched
+string. For regexps of reasonable size TRE needs less than 50K of
+dynamically allocated memory during the regcomp() call, less than
+20K for the compiled pattern buffer, and less than two kilobytes of
+temporary working space from the stack frame during a regexec()
+call. There is no time/memory tradeoff. TRE is also small in code
+size; statically linking with TRE increases the executable size
+less than 30K (gcc-3.2, x86, GNU/Linux).
+
+Wide character and multibyte character set support
+--------------------------------------------------
+
+TRE supports multibyte character sets. This makes it possible to
+use regexps seamlessly with, for example, Japanese locales. TRE
+also provides a wide character API.
+
+Binary pattern and data support
+-------------------------------
+
+TRE provides APIs which allow binary zero characters both in
+regexps and searched strings. The standard API cannot be easily
+used to, for example, search for printable words from binary data
+(although it is possible with some hacking). Searching for patterns
+which contain binary zeroes embedded is not possible at all with
+the standard API.
+
+Completely thread safe
+----------------------
+
+TRE is completely thread safe. All the exported functions are
+re-entrant, and a single compiled regexp object can be used
+simultaneously in multiple contexts; e.g. in main() and a signal
+handler, or in many threads of a multithreaded application.
+
+Portable
+--------
+
+TRE is portable across multiple platforms. Here's a table of
+platforms and compilers that have been successfully used to compile
+and run TRE:
+
+<table>
+ <tr><th>Platform(s)</th> <th>Compiler(s)</th></tr>
+ <tr><td>AIX 4.3.2 - 5.3.0</td> <td>GCC, C for AIX compiler version 5</td></tr>
+ <tr><td>Compaq Tru64 UNIX V5.1A/B</td> <td>Compaq C V6.4-014 - V6.5-011</td></tr>
+ <tr><td>Cygwin 1.3 - 1.5</td> <td>GCC</td></tr>
+ <tr><td>Digital UNIX V4.0</td> <td>DEC C V5.9-005</td></tr>
+ <tr><td>FreeBSD 4 and above</td> <td>GCC</td></tr>
+ <tr><td>GNU/Linux systems on x86, x86_64, ppc64, s390</td><td>GCC</td></tr>
+ <tr><td>HP-UX 10.20- 11.00</td> <td>GCC, HP C Compiler</td></tr>
+ <tr><td>IRIX 6.5</td> <td>GCC, MIPSpro Compilers 7.3.1.3m</td></tr>
+ <tr><td>Max OS X</td></tr>
+ <tr><td>NetBSD 1.5 and above</td> <td>GCC, egcs</td></tr>
+ <tr><td>OpenBSD 3.3 and above</td> <td>GCC</td></tr>
+ <tr><td>Solaris 2.7-10 sparc/x86</td> <td>GCC, Sun Workshop 6 compilers</td></tr>
+ <tr><td>Windows 98 - XP</td> <td>Microsoft Visual C++ 6.0</td></tr>
+</table>
+
+
+TRE 0.7.5 should compile without changes on all of the above
+platforms. Tell me if you are using TRE on a platform that is not
+listed above, and I'll add it to the list. Also let me know if TRE
+does not work on a listed platform.
+
+Depending on the platform, you may need to install libutf8 to get
+wide character and multibyte character set support.
+
+Free
+----
+
+TRE is released under a license which is essentially the same as
+the "2 clause" BSD-style license used in NetBSD. See the file
+LICENSE for details.
+
+Roadmap
+-------
+
+There are currently two features, both related to collating
+elements, missing from 100% POSIX compliance. These are:
+
+* Support for collating elements (e.g. [[.<X>.]], where <X> is a
+ collating element). It is not possible to support
+ multi-character collating elements portably, since POSIX does
+ not define a way to determine whether a character sequence is a
+ multi-character collating element or not.
+
+* Support for equivalence classes, for example [[=<X>=]], where
+ <X> is a collating element. An equivalence class matches any
+ character which has the same primary collation weight as
+ <X>. Again, POSIX provides no portable mechanism for
+ determining the primary collation weight of a collating
+ element.
+
+Note that other portable regexp implementations don't support
+collating elements either. The single exception is Regex++, which
+comes with its own database for collating elements for different
+locales. Support for collating elements and equivalence classes has
+not been widely requested and is not very high on the TODO list at
+the moment.
+
+These are other features I'm planning to implement real soon now:
+
+* All the missing GNU extensions enabled in GNU regex, such as
+ [[:<:]] and [[:>:]]
+
+* A REG_SHORTEST regexec() flag for returning the shortest match
+ instead of the longest match.
+
+* Perl-compatible syntax
+ * `[:^class:]`
+ * Matches anything but the characters in class. Note that
+ * [^[:class:]] works already, this would be just a
+ * convenience shorthand.
+ *
+ * `\A`
+ * Match only at beginning of string
+ *
+ * `\Z`
+ * Match only at end of string, or before newline at the end
+ *
+ * `\z`
+ * Match only at end of string
+ *
+ * `\l`
+ * Lowercase next char (think vi)
+ *
+ * `\u`
+ * Uppercase next char (think vi)
+ *
+ * `\L`
+ * Lowercase till \E (think vi)
+ *
+ * `\U`
+ * Uppercase till \E (think vi)
+ *
+ * `(?=pattern)`
+ * Zero-width positive look-ahead assertions.
+ *
+ * `(?!pattern)`
+ * Zero-width negative look-ahead assertions.
Home |
Main Index |
Thread Index |
Old Index