{"id":3181,"library":"nvidia-cuda-cccl","title":"CUDA Core Compute Libraries (CCCL) for Python","description":"The `nvidia-cuda-cccl` Python package provides Pythonic interfaces to the NVIDIA CUDA Core Compute Libraries (CCCL), specifically CUB and Thrust. CCCL itself unifies essential CUDA C++ libraries (Thrust, CUB, libcudacxx) to offer building blocks for writing safe and efficient CUDA C++ code. This Python binding enables developers to leverage GPU-accelerated parallel algorithms and cooperative primitives directly from Python, easing the implementation of custom algorithms without needing to drop down to C++. The library is currently in 'experimental' status, meaning its API and feature set can evolve rapidly.","status":"active","version":"13.2.27","language":"en","source_language":"en","source_url":"https://github.com/NVIDIA/cccl","tags":["cuda","gpu","parallel-computing","nvidia","python","high-performance-computing","numerical-computing"],"install":[{"cmd":"pip install cuda-cccl[cu13] # For CUDA 13.x","lang":"bash","label":"Pip (CUDA 13.x)"},{"cmd":"pip install cuda-cccl[cu12] # For CUDA 12.x","lang":"bash","label":"Pip (CUDA 12.x)"},{"cmd":"conda install -c conda-forge cccl-python","lang":"bash","label":"Conda (conda-forge)"}],"dependencies":[{"reason":"Required runtime environment.","package":"Python","version":">=3.10"},{"reason":"GPU acceleration requires a compatible CUDA Toolkit installation.","package":"CUDA Toolkit","version":"12.x or 13.x"},{"reason":"Requires a GPU with Compute Capability 6.0 or higher.","package":"NVIDIA GPU","version":"Compute Capability 6.0+"}],"imports":[{"note":"Provides device-level parallel algorithms like reduce, scan, and sort.","symbol":"compute","correct":"import cuda.compute"},{"note":"Provides block and warp-level cooperative primitives for custom CUDA kernels.","symbol":"coop","correct":"import cuda.coop"}],"quickstart":{"code":"import cuda.compute\nimport cupy as cp\n\ndef reduce_sum_example():\n    # Initialize CuPy array on GPU\n    arr = cp.arange(1000, dtype=cp.int32)\n    \n    # Perform a reduction sum using cuda.compute\n    result = cuda.compute.reduce(arr, op='sum')\n    \n    print(f\"Original array sum (CuPy): {arr.sum()}\")\n    print(f\"Reduced sum (cuda.compute): {result}\")\n    assert result == arr.sum()\n\nif __name__ == \"__main__\":\n    reduce_sum_example()","lang":"python","description":"This example demonstrates how to perform a parallel reduction sum on a CuPy array using `cuda.compute.reduce`. It showcases a basic use case of the Pythonic interface to GPU-accelerated algorithms."},"warnings":[{"fix":"Always refer to the latest documentation and release notes before upgrading. Pin exact versions in production environments.","message":"The `cuda-cccl` Python package is in 'experimental' status, meaning its API and feature set can change rapidly. Users should anticipate potential breaking changes between minor versions.","severity":"breaking","affected_versions":"All versions of the `cuda-cccl` Python package."},{"fix":"Ensure that your CCCL version is the same or newer than the version included with your CUDA Toolkit installation. Always use the latest compatible CCCL version for your CUDA Toolkit.","message":"CCCL is not forward compatible with the CUDA Toolkit. An older version of CCCL will not be compatible with a newer CUDA Toolkit. However, a newer CCCL can generally be integrated with an older CUDA Toolkit.","severity":"breaking","affected_versions":"All versions"},{"fix":"Update your C++ standard to C++17 or newer, ensure your CUDA Toolkit is 12.0+, and use compatible host compilers. Consult the CCCL 2.x to 3.0 Migration Guide for detailed steps.","message":"For C++ users, CCCL 3.0 (bundled with CUDA Toolkit 13.0) introduced significant breaking changes, including dropping support for C++11/14, CUDA Toolkit versions prior to 12.0, and older host compilers (GCC < 7, Clang < 14, MSVC < 2019). It also removed support for ICC and CUDA Dynamic Parallelism v1.","severity":"breaking","affected_versions":"CCCL 3.0 and newer (and corresponding CUDA Toolkit 13.0+)"},{"fix":"Update build systems to reflect the new header paths. For Python installations, prefer `pip install cuda-cccl[cuXX]` as specified, or `pip install cuda-cccl` for the general version when CUDA 13.0+ is in use, allowing the upstream to manage the CUDA version selection.","message":"Starting with CUDA Toolkit 13.0 (which includes CCCL 3.0), the on-disk header location for CCCL in the CUDA Toolkit installation moved to `${CTK_ROOT}/include/cccl/`. Also, the `nvidia-cuda-cccl` PyPI package will no longer use `cuXX` suffixes (e.g., `nvidia-cuda-cccl-cu12`) to allow upstream libraries to select the CUDA version.","severity":"breaking","affected_versions":"CCCL 3.0 and newer (and corresponding CUDA Toolkit 13.0+)"},{"fix":"For non-nvcc compilers, explicitly add the path to CCCL headers (e.g., `/usr/local/cuda/include` or `${CTK_ROOT}/include/cccl/`) to your build system's include paths (`-I`). Avoid `-isystem` and use `-I` to prevent collisions with implicitly included headers.","message":"When compiling CUDA C++ code with `nvcc`, it automatically adds CCCL headers to your include path. However, if compiling with other compilers, you must manually update your build system's include search path to point to the CCCL headers.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}