CuPy (CUDA 13.x)
CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated computing with Python, acting as a drop-in replacement for existing NumPy/SciPy code on NVIDIA CUDA platforms. It leverages CUDA Toolkit libraries like cuBLAS and cuFFT for significant speedups in numerical computations on GPUs. The current version is 14.0.1, and major releases occur less frequently (e.g., v14 was the first in two years), with minor and revision updates more common.
Common errors
-
ModuleNotFoundError: No module named 'cupy'
cause CuPy was either not installed in the active Python environment, or environment variables (like PATH) were not reloaded after installation, particularly when installing a CUDA Toolkit.fixEnsure you are in the correct virtual environment. If CuPy was just installed, restart your Python script, IDE, or terminal to refresh environment variables. Verify installation with `pip freeze | grep cupy`. -
TypeError: Argument 'x' has incorrect type (expected cupy.core.core.ndarray, got numpy.ndarray)
cause Attempting to pass a NumPy array (CPU-resident) directly to a CuPy function that expects a CuPy array (GPU-resident).fixConvert the NumPy array to a CuPy array using `cp.asarray()` or `cp.array()` before passing it to CuPy functions. Example: `gpu_array = cp.asarray(numpy_array)`. -
cupy.cuda.compiler.CompileException: nvrtc: error: failed to load builtins; catastrophic error: cannot open source file "cuda_fp16.h"
cause CuPy's CUDA compiler (NVRTC) cannot find necessary CUDA header files, often due to an incorrect or incomplete CUDA Toolkit installation, or an environment variable (`CUDA_PATH`, `LD_LIBRARY_PATH`) not being set correctly.fixVerify your CUDA Toolkit installation. Ensure `CUDA_PATH` or `LD_LIBRARY_PATH` are set if CUDA is in a non-standard location. If using PyPI `[ctk]` installation, make sure the `nvidia-cuda-runtime-cuXX` package is correctly installed to provide headers. You might need to explicitly install `cuda-cudart-dev-12-X` (for CUDA 12) or similar `cuda-cudart-dev-13-X` for CUDA 13. -
TypeError: Implicit conversion to a NumPy array is not allowed. Please use `.get()` to construct a NumPy array explicitly.
cause Attempting to implicitly convert a CuPy array to a NumPy array in contexts where explicit conversion is required, such as direct interaction with NumPy-only functions or printing large arrays.fixExplicitly convert the CuPy array to a NumPy array using `cupy.asnumpy()` or the `.get()` method. For example: `cpu_array = gpu_array.get()` or `cpu_array = cp.asnumpy(gpu_array)`.
Warnings
- breaking CuPy v14 aligns its behavior with NumPy 2 semantics, which includes changes to type promotion rules and casting behavior. Code relying on older NumPy 1.x type promotion might behave differently.
- breaking CuPy v14 has completely removed all cuDNN-related functionality. Direct usage of `cupy.cuda.cudnn` will fail.
- breaking Support for CUDA 11 and Python 3.9 has been dropped in CuPy v14. Users on these older environments must upgrade.
- gotcha Installing `cupy-cuda13x` requires a compatible NVIDIA CUDA Toolkit 13.x installation or driver. Mismatches in CUDA versions between the installed CuPy wheel and the system's CUDA Toolkit can lead to `ImportError` or runtime compilation errors.
- gotcha Initial execution of CuPy functions can be slower than subsequent calls due to just-in-time compilation and caching of CUDA kernels.
- gotcha GPU operations in CuPy are asynchronous by default. For accurate timing of GPU execution in benchmarks or to ensure operations complete before host interaction, explicit synchronization is necessary.
Install
-
pip install cupy-cuda13x -
pip install 'cupy-cuda13x[ctk]'
Imports
- cupy
import cupy as cp
- cupyx.scipy
import cupy.scipy
import cupyx.scipy as cpxs
Quickstart
import cupy as cp
import numpy as np
# Check if a GPU is available
if cp.cuda.is_available():
print(f"CuPy is available. Current device: {cp.cuda.Device().id}")
# Create a CuPy array on the GPU
x_gpu = cp.arange(10, dtype=cp.float32).reshape(2, 5)
print(f"GPU array:\n{x_gpu}")
print(f"Type of GPU array: {type(x_gpu)}")
# Perform a computation on the GPU
y_gpu = x_gpu * 2 + 1
print(f"Result of computation on GPU:\n{y_gpu}")
# Transfer the result back to CPU NumPy array
y_cpu = cp.asnumpy(y_gpu)
print(f"CPU array (from GPU):\n{y_cpu}")
print(f"Type of CPU array: {type(y_cpu)}")
# Demonstrate a simple NumPy-like operation
sum_gpu = x_gpu.sum(axis=1)
print(f"Sum along axis 1 on GPU: {sum_gpu}")
print(f"Type of sum on GPU: {type(sum_gpu)}")
# Ensure all GPU operations complete before proceeding (useful for timing)
cp.cuda.Stream.null.synchronize()
else:
print("No NVIDIA GPU found or CuPy is not properly installed for CUDA.")
print("Falling back to NumPy for demonstration.")
x_cpu = np.arange(10, dtype=np.float32).reshape(2, 5)
print(f"CPU array:\n{x_cpu}")