pkgsrc-Changes archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
CVS commit: pkgsrc/textproc/libstemmer
Module Name: pkgsrc
Committed By: wiz
Date: Thu Apr 27 08:12:44 UTC 2023
Modified Files:
pkgsrc/textproc/libstemmer: Makefile distinfo
pkgsrc/textproc/libstemmer/patches: patch-GNUmakefile
Log Message:
libstemmer: update to 2.2.0.
Snowball 2.2.0 (2021-11-10)
===========================
New Code Generators
-------------------
* Add Ada generator from Stephane Carrez (#135).
Javascript
----------
* Fix generated code to use integer division rather than floating point
division.
Noted by David Corbett.
Pascal
------
* Fix code generated for division. Previously real division was used and the
generated code would fail to compile with a "Incompatible types" error.
Noted by David Corbett.
* Fix code generated for Snowball's `minint` and `maxint` constant.
Python
------
* Python 2 is no longer actively supported, as proposed on the mailing list:
https://lists.tartarus.org/pipermail/snowball-discuss/2021-August/001721.html
* Fix code generated for division. Previously the Python code we generated
used integer division but rounded negative fractions towards negative
infinity rather than zero under Python 2, and under Python 3 used floating
point division.
Noted by David Corbett.
Code Quality Improvements
-------------------------
* C#: An `among` without functions is now generated as `static` and groupings
are now generated as constant. Patches from James Turner in #146 and #147.
Code generation improvements
----------------------------
* General:
+ Constant numeric subexpressions and constant numeric tests are now
evaluated at Snowball compile time.
Behavioural changes to existing algorithms
------------------------------------------
* german2: Fix handling of `qu` to match algorithm description. Previously
the implementation erroneously did `skip 2` after `qu`. We suspect this was
intended to skip the `qu` but that's already been done by the substring/among
matching, so it actually skips an extra two characters.
The implementation has always differed in this way, but there's no good
reason to skip two extra characters here so overall it seems best to change
the code to match the description. This change only affects the stemming of
a single word in the sample vocabulary - `quae` which seems to actually be
Latin rather than German.
Optimisations to existing algorithms
------------------------------------
* arabic: Handle exception cases in the among they're exceptions to.
* greek: Remove unused slice setting, handle exception cases in the among
they're exceptions to, and turn `substring ... among ... or substring ...
among ...` into a single `substring ... among ...` in cases where it is
trivial to do so.
* hindi: Eliminate the need for variable `p`.
* irish: Minor optimisation in setting `pV` and `p1`.
* yiddish: Make use of `among` more.
Compiler
--------
* Fix handling of `len` and `lenof` being declared as names.
For compatibility with programs written for older Snowball versions
len and lenof stop being tokens if declared as names. However this
code didn't work correctly if the tokeniser's name buffer needed to
be enlarged to hold the token name (i.e. 3 or 5 elements respectively).
* Report a clearer error if `=` is used instead of `==` in an integer test.
* Replace a single entry command list with its contents in the internal syntax
tree. This puts things in a more canonical form, which helps subsequent
optimisations.
Build system
------------
* Support building on Microsoft Windows (using mingw+msys or a similar
Unix-like environment). Patch from Jannick in #129.
* Split out INCLUDES from CPPFLAGS so that CPPFLAGS can now be overridden by
the user if required. Fixes #148, reported by Dominique Leuenberger.
* Regenerate algorithms.mk only when needed rather than on every `make` run.
libstemmer
----------
* The libstemmer static library now has a `.a` extension, rather than `.o`.
Patch from Michal Vasilek in #150.
Testsuite
---------
* stemtest: Test that numbers and numeric codes aren't damaged by any of the
algorithms. Regression test for #66. Fixes #81.
* ada: Fix ada tests to fail if output differs. There was an extra `| head
-300` compared to other languages, which meant that the exit code of `diff`
was ignored. It seems more helpful (and is more consistent) not to limit how
many differences are shown so just drop this addition.
* go: Stop thinning testdata. It looks like we only are because the test
harness code was based on that for rust, which was based on that for
javascript, which was only thinning because it was reading everything into
memory and the larger vocabulary lists were resulting in out of memory
issues.
* javascript: Speed up stemwords.js. Process input line-by-line rather than
reading the whole file into memory, splitting, iterating, and creating an
array with all the output, joining and writing out a single huge string.
This also means we can stop thinning the test data for javascript, which we
were only doing because the huge arabic test data file was causing out of
memory errors. Also drop the -p option, which isn't useful here and
complicates the code.
* rust: Turn on optimisation in the makefile rather than the CI config. This
makes the tests run in about 1/5 of the time and there's really no reason to
be thinning the testdata for rust.
Documentation
-------------
* CONTRIBUTING.rst: Improve documentation for adding a new stemming algorithm.
* Improve wording of Python docs.
To generate a diff of this commit:
cvs rdiff -u -r1.6 -r1.7 pkgsrc/textproc/libstemmer/Makefile
cvs rdiff -u -r1.5 -r1.6 pkgsrc/textproc/libstemmer/distinfo
cvs rdiff -u -r1.3 -r1.4 pkgsrc/textproc/libstemmer/patches/patch-GNUmakefile
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Modified files:
Index: pkgsrc/textproc/libstemmer/Makefile
diff -u pkgsrc/textproc/libstemmer/Makefile:1.6 pkgsrc/textproc/libstemmer/Makefile:1.7
--- pkgsrc/textproc/libstemmer/Makefile:1.6 Tue Jun 28 11:36:12 2022
+++ pkgsrc/textproc/libstemmer/Makefile Thu Apr 27 08:12:44 2023
@@ -1,8 +1,7 @@
-# $NetBSD: Makefile,v 1.6 2022/06/28 11:36:12 wiz Exp $
+# $NetBSD: Makefile,v 1.7 2023/04/27 08:12:44 wiz Exp $
-DISTNAME= snowball-2.1.0
+DISTNAME= snowball-2.2.0
PKGNAME= ${DISTNAME:S/snowball/libstemmer/}
-PKGREVISION= 2
CATEGORIES= textproc
MASTER_SITES= ${MASTER_SITE_GITHUB:=snowballstem/}
GITHUB_PROJECT= snowball
Index: pkgsrc/textproc/libstemmer/distinfo
diff -u pkgsrc/textproc/libstemmer/distinfo:1.5 pkgsrc/textproc/libstemmer/distinfo:1.6
--- pkgsrc/textproc/libstemmer/distinfo:1.5 Mon Apr 25 23:22:58 2022
+++ pkgsrc/textproc/libstemmer/distinfo Thu Apr 27 08:12:44 2023
@@ -1,7 +1,7 @@
-$NetBSD: distinfo,v 1.5 2022/04/25 23:22:58 tnn Exp $
+$NetBSD: distinfo,v 1.6 2023/04/27 08:12:44 wiz Exp $
-BLAKE2s (snowball-2.1.0.tar.gz) = 721871eda75290a6f9279d94beb5ab1383564dfdf782bf0cc9a8c0fccacc656d
-SHA512 (snowball-2.1.0.tar.gz) = 1efd7d8ab58852987e83247048244882c517e32237c8cb3c0558b66ecfb075733ce8805ebb76041e6e7d6664c236054effe66838e7c524ee529ce869aa8134f0
-Size (snowball-2.1.0.tar.gz) = 220324 bytes
-SHA1 (patch-GNUmakefile) = 85aa0b62ac0d51f31890d8c0f3eed82a3b3cba92
+BLAKE2s (snowball-2.2.0.tar.gz) = a342a1f35f5acd0c8bc7c2013cd9d4adac8db5e378a0874c3136e2adc2229f10
+SHA512 (snowball-2.2.0.tar.gz) = 02c43313de9de2518ea51cfb11f1c29145fc046c7838329bfdefd70b604009ad44b6db8175c25b0db31f03db30a6aec5857aa35775a9c204ec976df9cae62957
+Size (snowball-2.2.0.tar.gz) = 223846 bytes
+SHA1 (patch-GNUmakefile) = a4f2a2cf5409994302402433a6e1837ed82c5b08
SHA1 (patch-libstemmer_symbol.map) = 0122f03d0ac54dae908ffd873f1ae4a6e502a56f
Index: pkgsrc/textproc/libstemmer/patches/patch-GNUmakefile
diff -u pkgsrc/textproc/libstemmer/patches/patch-GNUmakefile:1.3 pkgsrc/textproc/libstemmer/patches/patch-GNUmakefile:1.4
--- pkgsrc/textproc/libstemmer/patches/patch-GNUmakefile:1.3 Mon Apr 25 23:22:58 2022
+++ pkgsrc/textproc/libstemmer/patches/patch-GNUmakefile Thu Apr 27 08:12:44 2023
@@ -1,30 +1,32 @@
-$NetBSD: patch-GNUmakefile,v 1.3 2022/04/25 23:22:58 tnn Exp $
+$NetBSD: patch-GNUmakefile,v 1.4 2023/04/27 08:12:44 wiz Exp $
* Build dynamic library, from archlinux.
---- GNUmakefile.orig 2021-01-21 04:50:09.000000000 +0000
+--- GNUmakefile.orig 2021-11-10 02:42:18.000000000 +0000
+++ GNUmakefile
-@@ -162,10 +162,10 @@ C_OTHER_OBJECTS = $(C_OTHER_SOURCES:.c=.
+@@ -170,12 +170,12 @@ C_OTHER_OBJECTS = $(C_OTHER_SOURCES:.c=.
JAVA_CLASSES = $(JAVA_SOURCES:.java=.class)
JAVA_RUNTIME_CLASSES=$(JAVARUNTIME_SOURCES:.java=.class)
-CFLAGS=-O2 -W -Wall -Wmissing-prototypes -Wmissing-declarations
--CPPFLAGS=-Iinclude
-+CFLAGS+=-fPIC -O2 -W -Wall -Wmissing-prototypes -Wmissing-declarations
-+CPPFLAGS+=-Iinclude
-
--all: snowball libstemmer.o stemwords $(C_OTHER_SOURCES) $(C_OTHER_HEADERS) $(C_OTHER_OBJECTS)
-+all: snowball libstemmer.o libstemmer.so stemwords $(C_OTHER_SOURCES) $(C_OTHER_HEADERS) $(C_OTHER_OBJECTS)
-
- clean:
- rm -f $(COMPILER_OBJECTS) $(RUNTIME_OBJECTS) \
-@@ -212,6 +212,9 @@ libstemmer/libstemmer.o: libstemmer/modu
- libstemmer.o: libstemmer/libstemmer.o $(RUNTIME_OBJECTS) $(C_LIB_OBJECTS)
- $(AR) -cru $@ $^
+-CPPFLAGS=
++CFLAGS+=-O2 -W -Wall -Wmissing-prototypes -Wmissing-declarations
++CPPFLAGS+=
+
+ INCLUDES=-Iinclude
+
+-all: snowball$(EXEEXT) libstemmer.a stemwords$(EXEEXT) $(C_OTHER_SOURCES) $(C_OTHER_HEADERS) $(C_OTHER_OBJECTS)
++all: snowball$(EXEEXT) libstemmer.so stemwords$(EXEEXT) $(C_OTHER_SOURCES) $(C_OTHER_HEADERS) $(C_OTHER_OBJECTS)
+
+ algorithms.mk: libstemmer/mkalgorithms.pl libstemmer/modules.txt
+ libstemmer/mkalgorithms.pl algorithms.mk libstemmer/modules.txt
+@@ -214,6 +214,9 @@ libstemmer/libstemmer.c: libstemmer/libs
+ libstemmer/libstemmer_utf8.c: libstemmer/libstemmer_c.in
+ sed 's/@MODULES_H@/modules_utf8.h/' $^ >$@
+libstemmer.so: libstemmer/libstemmer.o $(RUNTIME_OBJECTS) $(C_LIB_OBJECTS)
+ $(CC) $(CFLAGS) -shared $(LDFLAGS) -Wl,-soname,libstemmer.so.0 -Wl,--version-script=libstemmer/symbol.map -o $@.0.0.0 $^
+
- stemwords: $(STEMWORDS_OBJECTS) libstemmer.o
- $(CC) $(CFLAGS) $(LDFLAGS) -o $@ $^
+ libstemmer/modules.h libstemmer/mkinc.mak: libstemmer/mkmodules.pl libstemmer/modules.txt
+ libstemmer/mkmodules.pl $@ $(c_src_dir) libstemmer/modules.txt libstemmer/mkinc.mak
Home |
Main Index |
Thread Index |
Old Index