PyCUDA

raw JSON →
2026.1 verified Fri May 01 auth: no python

PyCUDA is a Python wrapper for Nvidia CUDA. It provides access to Nvidia's CUDA parallel computation API from Python. The latest version is 2026.1 (requires Python ~=3.8). Release cadence is roughly semi-annual.

pip install pycuda
error pycuda.driver.Error: CUDA_ERROR_NO_DEVICE
cause No compatible GPU or CUDA drivers not installed.
fix
Check nvidia-smi for GPU presence; install CUDA toolkit and driver.
error ModuleNotFoundError: No module named 'pycuda'
cause PyCUDA not installed in current Python environment.
fix
Run pip install pycuda.
error ImportError: libcuda.so.1: cannot open shared object file
cause CUDA driver library not found in library path.
fix
Ensure CUDA toolkit is installed and LD_LIBRARY_PATH includes CUDA's lib64 directory.
breaking CUDA 13.x compatibility: PyCUDA v2025.1.2 added a fix for cuCtxCreate API change. Earlier versions may crash with CUDA 13.
fix Upgrade to pycuda>=2025.1.2 or use CUDA 12.x.
deprecated appdirs replaced by platformdirs in v2024.1.1; if you depend on appdirs directly, update your imports.
fix Use `platformdirs` instead of `appdirs`.
gotcha Memory transfers require explicit use of `cuda.In`, `cuda.Out`, `cuda.InOut`, or `cuda.mem_alloc`/`cuda.memcpy_htod`. Simply passing numpy arrays will not work.
fix Always wrap arrays with the appropriate flag: `cuda.InOut(a)` for read-write.

Kernel that adds 1 to each element of a float array.

import pycuda.autoinit
import pycuda.driver as cuda
import numpy as np
from pycuda.compiler import SourceModule

mod = SourceModule("""
__global__ void add_one(float *a)
{
    int idx = threadIdx.x;
    a[idx] += 1.0f;
}
""")

add_one = mod.get_function("add_one")
a = np.float32([1.0, 2.0, 3.0])
add_one(cuda.InOut(a), block=(32,1,1), grid=(1,1))
print(a)