pkgsrc-WIP-changes archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

py-dask: Update to 2024.4.2



Module Name:	pkgsrc-wip
Committed By:	Matthew Danielson <matthewd%fastmail.us@localhost>
Pushed By:	matthewd
Date:		Tue Apr 30 15:21:26 2024 -0700
Changeset:	f716435d56cc272813391cc557a1947c0d6bc067

Modified Files:
	py-dask/Makefile
	py-dask/distinfo

Log Message:
py-dask: Update to 2024.4.2

2024.4.2
Highlights
Trivial Merge Implementation
The Query Optimizer will inspect quires to determine if a merge(...) or groupby(...).apply(...) requires a shuffle. A shuffle can be avoided, if the DataFrame was shuffled on the same columns in a previous step without any operations in between that change the partitioning layout or the relevant values in each partition.
result = df.merge(df2, on="a")
result = result.merge(df3, on="a")
The Query optimizer will identify that result was previously shuffled on "a" as well and thus only shuffle df3 in the second merge operation before doing a blockwise merge.
Auto-partitioning in read_parquet
The Query Optimizer will automatically repartition datasets read from Parquet files if individual partitions are too small. This will reduce the number of partitions in consequentially also the size of the task graph.
The Optimizer aims to produce partitions of at least 75MB and will combine multiple files together if necessary to reach this threshold. The value can be configured by using
dask.config.set({"dataframe.parquet.minimum-partition-size": 100_000_000})
The value is given in bytes. The default threshold is relatively conservative to avoid memory issues on worker nodes with a relatively small amount of memory per thread.

2024.4.1
This is a minor bugfix release that that fixes an error when importing dask.dataframe with Python 3.11.9.
See GH#11035 and GH#11039 from Richard (Rick) Zamora for details.

2024.4.0
Highlights
Query planning fixes
This release contains a variety of bugfixes in Dask DataFrame’s new query planner.
GPU metric dashboard fixes
GPU memory and utilization dashboard functionality has been restored. Previously these plots were unintentionally left blank.
See GH#8572 from Benjamin Zaitlen for details.

2024.3.1
This is a minor release that primarily demotes an exception to a warning if dask-expr is not installed when upgrading.

2024.3.0
Released on March 11, 2024
Highlights
Query planning
This release is enabling query planning by default for all users of dask.dataframe.
The query planning functionality represents a rewrite of the DataFrame using dask-expr. This is a drop-in replacement and we expect that most users will not have to adjust any of their code. Any feedback can be reported on the Dask issue tracker or on the query planning feedback issue.
If you are encountering any issues you are still able to opt-out by setting
import dask
dask.config.set({'dataframe.query-planning': False})
Sunset of Pandas 1.X support
The new query planning backend is requiring at least pandas 2.0. This pandas version will automatically be installed if you are installing from conda or if you are installing using dask[complete] or dask[dataframe] from pip.
The legacy DataFrame implementation is still supporting pandas 1.X if you install dask without extras.

To see a diff of this commit:
https://wip.pkgsrc.org/cgi-bin/gitweb.cgi?p=pkgsrc-wip.git;a=commitdiff;h=f716435d56cc272813391cc557a1947c0d6bc067

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.

diffstat:
 py-dask/Makefile | 2 +-
 py-dask/distinfo | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diffs:
diff --git a/py-dask/Makefile b/py-dask/Makefile
index 47290bda95..94c4e10d9f 100644
--- a/py-dask/Makefile
+++ b/py-dask/Makefile
@@ -1,6 +1,6 @@
 # $NetBSD$
 
-GITHUB_TAG=	2024.2.1
+GITHUB_TAG=	2024.4.2
 DISTNAME=	dask-${GITHUB_TAG}
 PKGNAME=	${PYPKGPREFIX}-${DISTNAME}
 GITHUB_PROJECT=	dask
diff --git a/py-dask/distinfo b/py-dask/distinfo
index ffbad31d19..becae025dd 100644
--- a/py-dask/distinfo
+++ b/py-dask/distinfo
@@ -1,5 +1,5 @@
 $NetBSD$
 
-BLAKE2s (dask-2024.2.1.tar.gz) = 6799a8f03ecfb71ed69206fbc660d7fbd9ecd054db49d501eabf530e0605bcea
-SHA512 (dask-2024.2.1.tar.gz) = 3863ec9126ba9fa0cf067a62d3d763d7cf52c6e49cdbfc258336b4536922c46f23443ff8aa4eb49176a38d2a70e03e2d5ca8a2c4b96c98d474654eec4e44c9c5
-Size (dask-2024.2.1.tar.gz) = 9341330 bytes
+BLAKE2s (dask-2024.4.2.tar.gz) = 4173442c74a3e98625d0efc0c260378bf2258f4431774a58ede6c543ced09d70
+SHA512 (dask-2024.4.2.tar.gz) = 1562cc3ad55973e14526d07d965aff0a41b0521a212070f4dc191bb9c4c48a4ea03c1c196b288b91ee1e917ec91faa1cab227476bbf619aa9cc7beae4bb60042
+Size (dask-2024.4.2.tar.gz) = 9364544 bytes


Home | Main Index | Thread Index | Old Index