CUDA Python Bindings
cuda-bindings provides low-level Python wrappers for the NVIDIA CUDA C driver and runtime APIs. It is a core component of the broader NVIDIA 'CUDA Python' initiative, aiming to unify and simplify GPU-accelerated computing in Python. The current version is 13.2.0, with releases often tied to CUDA Toolkit versions and ongoing development to integrate Python as a first-class language in the CUDA ecosystem.
Warnings
- breaking Mismatch between CUDA Toolkit, NVIDIA GPU driver, and `cuda-bindings` versions is a common source of runtime errors, including 'CUDA Driver Version Insufficient', 'No Kernel Image Available', or failure to find CUDA-enabled devices.
- gotcha Updating `cuda-python` (which `cuda-bindings` is a part of) from older versions (e.g., v12.6.2.post1 and below) using `pip install -U cuda-python` might fail.
- gotcha `cuda-bindings` provides direct, low-level access to the CUDA C APIs. This requires explicit memory management, device context handling, and kernel configuration, which can be more complex than higher-level libraries like Numba CUDA or CuPy.
- gotcha Out-of-memory (OOM) errors or illegal memory access can occur when dealing with large datasets or complex models, especially on GPUs with limited VRAM, or due to incorrect memory operations within CUDA kernels.
Install
-
pip install cuda-bindings
Imports
- cuInit
import cuda.cuda as cu # ... then call cu.cuInit(0)
Quickstart
import cuda.cuda as cu
import cuda.cuda.runtime as rt
import ctypes # For C types like c_int, c_size_t
# Initialize CUDA Driver API
cu.cuInit(0)
# Get device count
count = ctypes.c_int()
cu.cuDeviceGetCount(ctypes.byref(count))
print(f"Found {count.value} CUDA devices.")
# Get properties for each device
for i in range(count.value):
device = cu.CUdevice()
cu.cuDeviceGet(ctypes.byref(device), i)
name_buffer = ctypes.create_string_buffer(256)
cu.cuDeviceGetName(name_buffer, len(name_buffer), device)
print(f" Device {i}: {name_buffer.value.decode().strip()}")
total_mem = ctypes.c_size_t()
cu.cuDeviceTotalMem(ctypes.byref(total_mem), device)
print(f" Total Memory: {total_mem.value / (1024**3):.2f} GB")