NVIDIA cuSOLVER Python Bindings
The `nvidia-cusolver` package is a metapackage that provides the native runtime libraries for NVIDIA's cuSOLVER, along with Python bindings for these GPU-accelerated numerical linear algebra routines. It is tightly coupled with the NVIDIA CUDA Toolkit. This allows Python applications to leverage GPU power for tasks such as solving dense and sparse linear systems, eigenvalue problems, and singular value decompositions. The current version is 12.1.0.51, reflecting its alignment with CUDA Toolkit 12.1. Its release cadence follows major CUDA Toolkit updates.
Warnings
- breaking The `nvidia-cusolver` package versions are tightly coupled with specific NVIDIA CUDA Toolkit versions (e.g., `cu12` implies CUDA 12.x). Using a version of `nvidia-cusolver` incompatible with your system's CUDA Toolkit or GPU driver can lead to runtime errors or incorrect results.
- gotcha This library requires a compatible NVIDIA GPU and appropriate drivers to function. Running Python code that imports `cuda.cusolver` on a system without these prerequisites will result in `cuda.cuda.CU_ERROR_NO_DEVICE` or similar errors.
- gotcha The `nvidia-cusolver` package is a metapackage that bundles native libraries and `nvidia-cusolver-bindings`. The actual Python module for cuSOLVER functionality is imported as `from cuda import cusolver`, not directly `import nvidia_cusolver`.
- gotcha cuSOLVER operates on GPU memory. You will need a compatible Python array library like CuPy (highly recommended) or Numba's CUDA device arrays to easily create, transfer, and manage data on the GPU for use with `cuda.cusolver` functions.
- gotcha The `cuda.cusolver` API is a low-level wrapper around the C cuSOLVER library. This means users are often responsible for manual memory management (e.g., allocating scratchpad memory for workspace), handle management (creating and destroying handles), and explicit error checking via info arrays.
Install
-
pip install nvidia-cusolver -
pip install cupy-cuda12x
Imports
- cusolver
from cuda import cusolver
Quickstart
import cupy as cp
from cuda import cusolver, cuda
# Initialize CUDA context (often implicit with CuPy)
# Create a symmetric positive-definite matrix on GPU using CuPy
# For simplicity, let's create a diagonally dominant matrix
n = 4
A_host = cp.array([
[4.0, 1.0, 1.0, 1.0],
[1.0, 3.0, 0.0, 0.0],
[1.0, 0.0, 2.0, 0.0],
[1.0, 0.0, 0.0, 1.0]
], dtype=cp.float32)
A_device = cp.asarray(A_host, dtype=cp.float32)
# Allocate memory for factorization output (in-place for potrf)
# Allocate memory for info (error code)
info_device = cp.zeros(1, dtype=cp.int32)
# Create a cuSOLVER handle
handle = None
try:
handle = cusolver.create_handle()
# Query workspace size for potrf (Cholesky factorization)
lwork = cusolver.spotrf_bufferSize(handle,
cusolver.cudaSolver_fact_info.CUSOLVER_STATUS_SUCCESS,
n, A_device.data.ptr, n)
workspace = cp.zeros(lwork, dtype=cp.float32)
# Perform Cholesky factorization: A = L * L^T (or U^T * U)
# We'll use CUSOLVER_FILL_MODE_LOWER (lower triangle)
cusolver.spotrf(handle,
cusolver.cudaSolver_fact_info.CUSOLVER_FILL_MODE_LOWER,
n, A_device.data.ptr, n, workspace.data.ptr, lwork,
info_device.data.ptr)
# Check for errors
info = info_device.get()
if info[0] != 0:
print(f"Cholesky factorization failed with error code: {info[0]}")
else:
print("Original Matrix (GPU):\n", A_host)
print("Cholesky Factor L (GPU, lower triangle of A_device):\n", A_device.get())
finally:
if handle:
cusolver.destroy_handle(handle)