pkgsrc-Changes archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
CVS commit: pkgsrc/meta-pkgs/nltk_data
Module Name: pkgsrc
Committed By: wiz
Date: Wed Nov 24 15:56:18 UTC 2021
Added Files:
pkgsrc/meta-pkgs/nltk_data: common.mk howto.md split.py
Log Message:
nltk_data: add shared files for nltk_data packages
This also includes a tool to create these packages.
To generate a diff of this commit:
cvs rdiff -u -r0 -r1.1 pkgsrc/meta-pkgs/nltk_data/common.mk \
pkgsrc/meta-pkgs/nltk_data/howto.md pkgsrc/meta-pkgs/nltk_data/split.py
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Added files:
Index: pkgsrc/meta-pkgs/nltk_data/common.mk
diff -u /dev/null pkgsrc/meta-pkgs/nltk_data/common.mk:1.1
--- /dev/null Wed Nov 24 15:56:18 2021
+++ pkgsrc/meta-pkgs/nltk_data/common.mk Wed Nov 24 15:56:18 2021
@@ -0,0 +1,24 @@
+# $NetBSD: common.mk,v 1.1 2021/11/24 15:56:18 wiz Exp $
+
+MASTER_SITES= https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/${TYPE}/
+EXTRACT_SUFX?= .zip
+
+MAINTAINER?= pkgsrc-users%NetBSD.org@localhost
+HOMEPAGE?= https://www.nltk.org/data.html
+COMMENT?= Natural Language Toolkit (NLTK) Data
+
+INSTALLATION_DIRS+= share/nltk_data/${TYPE}
+
+UNPACK?= no
+
+do-build:
+
+.if ${UNPACK} == "no"
+do-install:
+ ${INSTALL_DATA} ${_DISTDIR}/${DISTNAME}${EXTRACT_SUFX} ${DESTDIR}${PREFIX}/share/nltk_data/${TYPE}
+.else
+USE_TOOLS+= pax
+
+do-install:
+ cd ${WRKDIR} && ${PAX} -pp -rw ${DISTNAME} ${DESTDIR}${PREFIX}/share/nltk_data/${TYPE}/
+.endif
Index: pkgsrc/meta-pkgs/nltk_data/howto.md
diff -u /dev/null pkgsrc/meta-pkgs/nltk_data/howto.md:1.1
--- /dev/null Wed Nov 24 15:56:18 2021
+++ pkgsrc/meta-pkgs/nltk_data/howto.md Wed Nov 24 15:56:18 2021
@@ -0,0 +1,21 @@
+# Sources
+
+Fetch https://www.nltk.org/nltk_data/ which is an XML file with an XSL
+stylesheet
+
+ wget -O nltk_data.xml https://www.nltk.org/nltk_data/
+
+should work.
+This file contains one line per data, as of 2021-11-24 there are 108 entries,
+and some meta package information.
+
+# Generating the packages
+
+Update the date in `split.py` and run it:
+
+ split.py
+
+It will generate one package for each entry in the list in textproc/nltk_data-${id}
+You'll then need to run 'make mdi' in each directory. If the package existed
+before, make sure that the data really changed (distinfo checksums/size differ)
+before committing.
Index: pkgsrc/meta-pkgs/nltk_data/split.py
diff -u /dev/null pkgsrc/meta-pkgs/nltk_data/split.py:1.1
--- /dev/null Wed Nov 24 15:56:18 2021
+++ pkgsrc/meta-pkgs/nltk_data/split.py Wed Nov 24 15:56:18 2021
@@ -0,0 +1,49 @@
+#!/usr/bin/env python3
+
+import os
+import xml.etree.ElementTree as ET
+
+tree = ET.parse('nltk_data.xml')
+
+root = tree.getroot()
+
+for child in root[0]:
+ id = child.attrib["id"]
+ path = f"/usr/pkgsrc/textproc/nltk_data-{id}"
+ try:
+ os.mkdir(path)
+ except Exception:
+ pass
+ name = child.attrib["name"]
+ if "webpage" in child.attrib:
+ webpage = "HOMEPAGE=\t" + child.attrib["webpage"]
+ else:
+ webpage = ""
+ if "license" in child.attrib:
+ license = child.attrib["license"]
+ subdir = child.attrib["subdir"]
+ url = child.attrib["url"]
+ with open(path + "/Makefile", "w") as f:
+ print(f"""# $NetBSD: split.py,v 1.1 2021/11/24 15:56:18 wiz Exp $
+
+DISTNAME= {id}
+PKGNAME= nltk_data-{id}-20211124
+CATEGORIES= textproc
+DIST_SUBDIR= ${{PKGNAME_NOREV}}
+
+{webpage}
+COMMENT= NLTK Data - {name}
+#LICENSE= {license}
+
+TYPE= {subdir}
+
+.include "../../meta-pkgs/nltk_data/common.mk"
+.include "../../mk/bsd.pkg.mk"
+""", file=f, end='')
+ with open(path + "/DESCR", "w") as f:
+ print(f"""This package contains data for NLTK, the Natural Language Toolkit.
+
+This package contains data from/for {name}.""", file=f)
+ with open(path + "/PLIST", "w") as f:
+ print(f"""@comment $NetBSD: split.py,v 1.1 2021/11/24 15:56:18 wiz Exp $
+share/nltk/{subdir}/{id}.zip""", file=f)
Home |
Main Index |
Thread Index |
Old Index