CUDA Core Compute Libraries (CCCL) for Python

13.2.27 · active · verified Sat Apr 11

The `nvidia-cuda-cccl` Python package provides Pythonic interfaces to the NVIDIA CUDA Core Compute Libraries (CCCL), specifically CUB and Thrust. CCCL itself unifies essential CUDA C++ libraries (Thrust, CUB, libcudacxx) to offer building blocks for writing safe and efficient CUDA C++ code. This Python binding enables developers to leverage GPU-accelerated parallel algorithms and cooperative primitives directly from Python, easing the implementation of custom algorithms without needing to drop down to C++. The library is currently in 'experimental' status, meaning its API and feature set can evolve rapidly.

Warnings

Install

Imports

Quickstart

This example demonstrates how to perform a parallel reduction sum on a CuPy array using `cuda.compute.reduce`. It showcases a basic use case of the Pythonic interface to GPU-accelerated algorithms.

import cuda.compute
import cupy as cp

def reduce_sum_example():
    # Initialize CuPy array on GPU
    arr = cp.arange(1000, dtype=cp.int32)
    
    # Perform a reduction sum using cuda.compute
    result = cuda.compute.reduce(arr, op='sum')
    
    print(f"Original array sum (CuPy): {arr.sum()}")
    print(f"Reduced sum (cuda.compute): {result}")
    assert result == arr.sum()

if __name__ == "__main__":
    reduce_sum_example()

view raw JSON →