dask-cudf-cu12

raw JSON →
26.4.0 verified Mon Apr 27 auth: no python

Utilities for integrating Dask with cuDF on CUDA 12.x. This package provides the distributed DataFrame functionality backed by cuDF, leveraging cuDF's GPU-accelerated columnar operations. Version 26.4.0 requires Python >=3.11 and is part of the RAPIDS 26.04 release. Releases follow a quarterly cadence aligned with RAPIDS.

pip install dask-cudf-cu12
error ModuleNotFoundError: No module named 'dask_cudf'
cause Package not installed or wrong variant installed.
fix
pip install dask-cudf-cu12 (for CUDA 12.x) or pip install dask-cudf (for CUDA 11.x).
error AttributeError: module 'dask_cudf' has no attribute 'from_cudf'
cause Old or mismatched version of dask-cudf/cudf.
fix
Upgrade packages: pip install --upgrade dask-cudf-cu12 cudf-cu12
breaking dask-cudf-cu12 is CUDA 12.x only. Use dask-cudf for CUDA 11.x or older. Installing the wrong variant for your CUDA version will cause import errors.
fix Check your CUDA version with nvidia-smi. Install dask-cudf-cu12 if CUDA >=12.0, otherwise dask-cudf.
deprecated DataFrame.apply_chunks and Groupby.apply_grouped have been removed since v25.12.00. Use map_partitions or groupby.apply instead.
fix Replace df.apply_chunks(func, ...) with df.map_partitions(func). For grouped operations, use groupby_obj.apply(func, meta=...).
gotcha Conda environment with both dask-cudf-cu12 and dask-cudf leads to import confusion. Pip similarly can mix packages. Only one variant should be installed.
fix Use separate conda environments for CUDA 11.x and 12.x, or pip install only the correct variant.
gotcha dask_cudf.from_cudf() does not automatically repartition data. If the source cuDF DataFrame has too few rows, Dask may underutilize GPUs.
fix Use npartitions parameter explicitly: dask_cudf.from_cudf(df, npartitions=len(gpu_devices)).
conda install -c rapidsai -c conda-forge dask-cudf-cu12

Creates a dask_cudf DataFrame from a cuDF DataFrame and computes the result.

import dask_cudf
import cudf

# Create a cuDF Series and then a dask_cudf DataFrame
df_cudf = cudf.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
ddf = dask_cudf.from_cudf(df_cudf, npartitions=2)
print(ddf.compute())