CUDA Python Bindings

13.2.0 verified Tue May 12 auth: no python install: stale

cuda-bindings provides low-level Python wrappers for the NVIDIA CUDA C driver and runtime APIs. It is a core component of the broader NVIDIA 'CUDA Python' initiative, aiming to unify and simplify GPU-accelerated computing in Python. The current version is 13.2.0, with releases often tied to CUDA Toolkit versions and ongoing development to integrate Python as a first-class language in the CUDA ecosystem.

pip install cuda-bindings

Common errors

error ModuleNotFoundError: No module named 'cuda' ↓

cause This error typically occurs when the `cuda-python` package (which includes `cuda-bindings`) is not installed, or when Python cannot locate the installed package due to environment configuration issues (e.g., incorrect virtual environment or `PYTHONPATH`).

fix

Ensure the cuda-python package is installed: pip install cuda-python. If already installed, verify your Python environment is correctly activated and that the package is accessible. In some cases, a version mismatch with other GPU-accelerated libraries might also cause this, requiring specific cuda-python versions.

error ModuleNotFoundError: No module named 'cuda.bindings' ↓

cause This error often indicates that while a `cuda-python` related package might be present, the specific `cuda.bindings` submodule is not found. This can happen due to an older version of `cuda-python` where the module layout was different, or an incomplete/corrupted installation.

error RuntimeError: CUDA driver failed to initialize: <error message> ↓

cause This runtime error signifies a problem with the NVIDIA CUDA driver or its interaction with the `cuda-bindings` library. Common causes include an outdated or incompatible GPU driver, issues with `LD_LIBRARY_PATH` (on Linux) not including CUDA runtime libraries, or a driver mismatch in containerized environments.

fix

Update your NVIDIA GPU drivers to the latest version. For Linux, ensure LD_LIBRARY_PATH correctly points to your CUDA installation's lib64 directory. If using Docker, ensure the NVIDIA Container Toolkit is correctly installed and configured, and the container is run with --runtime=nvidia --gpus all.

error RuntimeError: ('Unable to allocate CUDA array:', <cudaError_t.cudaErrorInsufficientDriver: 35>) ↓

cause This specific runtime error indicates that the CUDA driver installed on your system is too old for the `cuda-bindings` version you are trying to use. The `pip` installer might select a newer `cuda-bindings` version that requires a more modern device driver.

fix

Update your NVIDIA GPU drivers to the latest version. Alternatively, install a specific version of cuda-bindings that is compatible with your current driver, e.g., pip install cuda-bindings==12.8 (adjust version as needed to match your CUDA Toolkit version).

error from cuda import cuda, cudart ↓

cause While not an error message, this import pattern frequently leads to `ModuleNotFoundError` or `ImportError` because the module layout of `cuda-python` has changed in recent versions. Direct imports like `from cuda import cuda` are often no longer valid or recommended for core driver/runtime APIs.

fix

For newer cuda-python versions (12.8.0 and above), the recommended way to import driver and runtime APIs is from cuda.bindings import driver as cuda and from cuda.bindings import runtime as cudart. The top-level cuda module now serves as a meta-package.

Warnings

breaking Mismatch between CUDA Toolkit, NVIDIA GPU driver, and `cuda-bindings` versions is a common source of runtime errors, including 'CUDA Driver Version Insufficient', 'No Kernel Image Available', or failure to find CUDA-enabled devices. ↓

fix Ensure that your installed NVIDIA GPU driver, CUDA Toolkit, and `cuda-bindings` Python package are compatible. Consult the NVIDIA CUDA Python documentation for compatibility matrices.

gotcha Updating `cuda-python` (which `cuda-bindings` is a part of) from older versions (e.g., v12.6.2.post1 and below) using `pip install -U cuda-python` might fail. ↓

fix Perform a clean re-installation by first uninstalling with `pip uninstall -y cuda-python` (or `pip uninstall -y cuda-bindings`) followed by a fresh `pip install cuda-python` (or `pip install cuda-bindings`).

gotcha `cuda-bindings` provides direct, low-level access to the CUDA C APIs. This requires explicit memory management, device context handling, and kernel configuration, which can be more complex than higher-level libraries like Numba CUDA or CuPy. ↓

fix Be prepared to work with C types (e.g., `ctypes`) and manage GPU resources explicitly. For many common scientific computing or deep learning tasks, higher-level libraries might offer a simpler abstraction. Consider `cuda.core` for more Pythonic access to CUDA runtime functionalities if raw C API interaction is not strictly necessary.

gotcha Out-of-memory (OOM) errors or illegal memory access can occur when dealing with large datasets or complex models, especially on GPUs with limited VRAM, or due to incorrect memory operations within CUDA kernels. ↓

fix Monitor GPU memory usage (`nvidia-smi`). Reduce batch sizes, optimize memory allocation patterns, and ensure correct buffer sizing and indexing in custom CUDA kernels. Explicitly free unused GPU memory if applicable.

breaking The `cuda-python` or `cuda-bindings` package is not installed or not accessible in the current Python environment, leading to a `ModuleNotFoundError` when attempting to import `cuda.cuda`. ↓

fix Ensure the `cuda-python` package is installed using `pip install cuda-python` (or `pip install cuda-bindings`). If installed, verify that the Python environment (e.g., virtual environment) where the script is run has access to the installed package.

breaking Installation of `cuda-bindings` (or `cuda-python`) may fail with 'No matching distribution found' errors, particularly when using newer Python versions (e.g., 3.13) or non-standard operating system/architecture combinations (e.g., Alpine Linux, ARM). Pre-built wheels for `cuda-bindings` are often limited to specific Python versions and common `glibc`-based Linux distributions. ↓

fix Verify the availability of pre-built wheels for your specific Python version and operating system/architecture on the `cuda-python` PyPI project page. If no wheels are available, consider using a supported Python version or a `glibc`-based Linux distribution. Building `cuda-bindings` from source is generally not supported or recommended due to its low-level nature and tight coupling with CUDA Toolkits.

Install compatibility stale last tested: 2026-05-12

python os / libc status wheel install import disk

3.10 alpine (musl) build_error - - - -

3.10 alpine (musl) - - - -

3.10 slim (glibc) wheel 1.8s - 42M

3.10 slim (glibc) - - - -

3.11 alpine (musl) build_error - - - -

3.11 alpine (musl) - - - -

3.11 slim (glibc) wheel 1.8s - 44M

3.11 slim (glibc) - - - -

3.12 alpine (musl) build_error - - - -

3.12 alpine (musl) - - - -

3.12 slim (glibc) wheel 1.7s - 35M

3.12 slim (glibc) - - - -

3.13 alpine (musl) build_error - - - -

3.13 alpine (musl) - - - -

3.13 slim (glibc) wheel 1.8s - 35M

3.13 slim (glibc) - - - -

3.9 alpine (musl) build_error - - - -

3.9 alpine (musl) - - - -

3.9 slim (glibc) wheel 2.6s - 125M

3.9 slim (glibc) - - - -

Imports

cuInit
wrong
```
import cuda.bindings.cuda as cu
```
correct
```
import cuda.cuda as cu
# ... then call cu.cuInit(0)
```
The canonical import is `cuda.cuda` as the `cuda-bindings` package exposes the CUDA C APIs directly under this module.

Quickstart last tested: 2026-04-24

This quickstart demonstrates how to initialize the CUDA driver, query the number of available CUDA devices, and print basic information for each, such as its name and total memory. It leverages the low-level CUDA C APIs exposed by `cuda-bindings`.

import cuda.cuda as cu
import cuda.cuda.runtime as rt
import ctypes # For C types like c_int, c_size_t

# Initialize CUDA Driver API
cu.cuInit(0)

# Get device count
count = ctypes.c_int()
cu.cuDeviceGetCount(ctypes.byref(count))
print(f"Found {count.value} CUDA devices.")

# Get properties for each device
for i in range(count.value):
    device = cu.CUdevice()
    cu.cuDeviceGet(ctypes.byref(device), i)

    name_buffer = ctypes.create_string_buffer(256)
    cu.cuDeviceGetName(name_buffer, len(name_buffer), device)
    print(f"  Device {i}: {name_buffer.value.decode().strip()}")

    total_mem = ctypes.c_size_t()
    cu.cuDeviceTotalMem(ctypes.byref(total_mem), device)
    print(f"    Total Memory: {total_mem.value / (1024**3):.2f} GB")