CUDA Core Compute Libraries (CCCL) for Python
The `nvidia-cuda-cccl` Python package provides Pythonic interfaces to the NVIDIA CUDA Core Compute Libraries (CCCL), specifically CUB and Thrust. CCCL itself unifies essential CUDA C++ libraries (Thrust, CUB, libcudacxx) to offer building blocks for writing safe and efficient CUDA C++ code. This Python binding enables developers to leverage GPU-accelerated parallel algorithms and cooperative primitives directly from Python, easing the implementation of custom algorithms without needing to drop down to C++. The library is currently in 'experimental' status, meaning its API and feature set can evolve rapidly.
Warnings
- breaking The `cuda-cccl` Python package is in 'experimental' status, meaning its API and feature set can change rapidly. Users should anticipate potential breaking changes between minor versions.
- breaking CCCL is not forward compatible with the CUDA Toolkit. An older version of CCCL will not be compatible with a newer CUDA Toolkit. However, a newer CCCL can generally be integrated with an older CUDA Toolkit.
- breaking For C++ users, CCCL 3.0 (bundled with CUDA Toolkit 13.0) introduced significant breaking changes, including dropping support for C++11/14, CUDA Toolkit versions prior to 12.0, and older host compilers (GCC < 7, Clang < 14, MSVC < 2019). It also removed support for ICC and CUDA Dynamic Parallelism v1.
- breaking Starting with CUDA Toolkit 13.0 (which includes CCCL 3.0), the on-disk header location for CCCL in the CUDA Toolkit installation moved to `${CTK_ROOT}/include/cccl/`. Also, the `nvidia-cuda-cccl` PyPI package will no longer use `cuXX` suffixes (e.g., `nvidia-cuda-cccl-cu12`) to allow upstream libraries to select the CUDA version.
- gotcha When compiling CUDA C++ code with `nvcc`, it automatically adds CCCL headers to your include path. However, if compiling with other compilers, you must manually update your build system's include search path to point to the CCCL headers.
Install
-
pip install cuda-cccl[cu13] # For CUDA 13.x -
pip install cuda-cccl[cu12] # For CUDA 12.x -
conda install -c conda-forge cccl-python
Imports
- compute
import cuda.compute
- coop
import cuda.coop
Quickstart
import cuda.compute
import cupy as cp
def reduce_sum_example():
# Initialize CuPy array on GPU
arr = cp.arange(1000, dtype=cp.int32)
# Perform a reduction sum using cuda.compute
result = cuda.compute.reduce(arr, op='sum')
print(f"Original array sum (CuPy): {arr.sum()}")
print(f"Reduced sum (cuda.compute): {result}")
assert result == arr.sum()
if __name__ == "__main__":
reduce_sum_example()