Dask-CUDA
Dask-CUDA is a Python library providing utilities to facilitate interactions between Dask and NVIDIA CUDA-enabled GPUs. It extends `dask.distributed`'s `LocalCluster` and `Worker` to manage and deploy Dask workers efficiently on GPU systems. Key features include automatic instantiation of per-GPU workers, setting CPU affinity for optimal performance, and robust GPU memory management, including spilling to host memory. It is a core component of the RAPIDS suite for GPU-accelerated data science. The library maintains an active development status with regular releases, currently at version 26.4.0.
Common errors
-
KeyError: 'protocol ucx does not exist'
cause The `ucx` protocol for Dask `distributed` requires the `distributed-ucxx` package, and `UCX-Py` support was deprecated.fixInstall `distributed-ucxx` (e.g., `pip install distributed-ucxx`) and ensure your Dask configuration is updated to use the new package. -
CUDA_ERROR_OUT_OF_MEMORY: out of memory
cause GPU memory limits are exceeded during computation. This is common when not using RAPIDS Memory Manager (RMM) or when spilling is not enabled/configured properly.fixInitialize `LocalCUDACluster` with `rmm_pool_size` (e.g., `rmm_pool_size=0.9`) and `enable_cudf_spill=True` to leverage RMM for memory pooling and spill to host memory if necessary. -
dask.distributed.worker - WARNING - No NVIDIA devices detected
cause Dask workers are not detecting available GPUs, possibly due to `CUDA_VISIBLE_DEVICES` environment variable not being set correctly or NVIDIA drivers/CUDA Toolkit issues.fixEnsure NVIDIA drivers are installed and functional. If running on a multi-GPU system, explicitly set `CUDA_VISIBLE_DEVICES` in your environment or pass it to `LocalCUDACluster` (e.g., `LocalCUDACluster(CUDA_VISIBLE_DEVICES='0,1')`). -
AttributeError: 'DataFrame' object has no attribute 'cudf'
cause Attempting to use `dask-cudf` specific methods on a generic `dask.dataframe` without explicitly setting the backend to 'cudf'.fixSet `dask.config.set({"dataframe.backend": "cudf"})` after importing `dask` to instruct Dask DataFrame to use cuDF as its backend, provided `cudf` is installed.
Warnings
- breaking The `numba` package is no longer a direct dependency as of v26.04.00. While `numba-cuda` remains relevant for CUDA JIT compilation, direct usage or reliance on the base `numba` package within `dask-cuda` might lead to issues.
- breaking Support for the `UCX-Py` library has been removed in favor of `distributed-ucxx` starting from v25.10.00. Direct use of `protocol='ucx'` may fail without the new `distributed-ucxx` package.
- breaking CUDA 11 support was removed from dependencies starting with v25.08.00. Users on CUDA 11 might experience compatibility issues or build failures.
- breaking Legacy `Dask-cuDF` handling was removed in v25.02.00a. Older patterns for integrating `dask-cudf` might no longer work as expected.
- gotcha When using `LocalCUDACluster` in a standalone Python script, it is crucial to enclose the cluster and client initialization within an `if __name__ == "__main__":` block. Failure to do so can lead to unexpected behavior, deadlocks, or errors related to subprocess spawning.
Install
-
pip install dask-cuda -
pip install 'dask-cuda[cu12]' # for CUDA 12 pip install 'dask-cuda[cu13]' # for CUDA 13 -
conda install -c rapidsai -c conda-forge dask-cuda cuda-version=13.1
Imports
- LocalCUDACluster
from dask_cuda import LocalCUDACluster
- Client
from dask.distributed import Client
Quickstart
import os
from dask_cuda import LocalCUDACluster
from dask.distributed import Client
if __name__ == "__main__":
# Recommended to run inside an if __name__ == "__main__": block
# Configure for 2 GPUs, 90% RMM pool size, and enable cuDF spilling
cluster = LocalCUDACluster(
CUDA_VISIBLE_DEVICES="0,1", # Example: use devices 0 and 1
rmm_pool_size=0.9, # Use 90% of GPU memory as a pool
enable_cudf_spill=True, # Enable spilling to host memory if needed
local_directory=os.environ.get('DASK_LOCAL_DIRECTORY', '/tmp/dask-cuda')
)
client = Client(cluster)
print(f"Dask-CUDA cluster dashboard link: {client.dashboard_link}")
# Your Dask-accelerated GPU computations go here
# For example, with dask-cudf:
# import dask.dataframe as dd
# import cudf
# dask.config.set({"dataframe.backend": "cudf"})
# ddf = dd.read_csv("my_gpu_data.csv")
# result = ddf.groupby("col").sum().compute()
client.close()
cluster.close()