NVIDIA cuSPARSE (CUDA 12)

raw JSON →
12.5.10.65 verified Tue May 12 auth: no python install: stale quickstart: stale

The `nvidia-cusparse-cu12` package provides the native runtime libraries for NVIDIA's cuSPARSE, a GPU-accelerated library for sparse matrix computations, specifically compatible with CUDA Toolkit 12.x. It offers highly optimized basic linear algebra subroutines for sparse matrices, enabling faster computations than CPU-only alternatives in fields like machine learning, AI, and scientific computing. This package is part of a series of NVIDIA-provided Python wheels that make CUDA runtime components available via PyPI, with the current version being 12.5.10.65. New versions are released in alignment with CUDA Toolkit updates.

pip install nvidia-cusparse-cu12
error ImportError: ... libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12
cause This error typically occurs due to a version mismatch between the installed NVIDIA CUDA Toolkit components (e.g., drivers, `nvidia-cusparse-cu12`, `nvidia-nvjitlink-cu12`) and the deep learning framework (like PyTorch) attempting to load them, leading to dynamic linking failures.
fix
Ensure that your NVIDIA drivers, CUDA Toolkit, nvidia-cusparse-cu12, nvidia-nvjitlink-cu12, and the deep learning framework (e.g., PyTorch) are all compatible. Downgrading or upgrading specific packages (especially torch or nvidia-nvjitlink-cu12) to a compatible set often resolves this. Sometimes, unsetting the LD_LIBRARY_PATH environment variable can also fix the issue.
error ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. torch X.Y.Z requires nvidia-cusparse-cu12==A.B.C; ..., but you have nvidia-cusparse-cu12 D.E.F which is incompatible.
cause This warning/error indicates that the version of `nvidia-cusparse-cu12` currently installed or being installed conflicts with the specific version required by another package, most commonly a deep learning framework like PyTorch or TensorFlow.
fix
Install the exact version of nvidia-cusparse-cu12 that the demanding package (e.g., torch) requires. Alternatively, install the deep learning framework using its recommended installation method for CUDA support, which often handles these dependencies correctly (e.g., pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121).
error RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED
cause While this error explicitly mentions CUBLAS, it signifies a general failure in initializing a core CUDA library, which can affect any CUDA-dependent operation, including those relying on cuSPARSE. Common causes include outdated or incompatible NVIDIA drivers/CUDA Toolkit, insufficient GPU memory, or issues with GPU device access.
fix
Verify that your NVIDIA GPU drivers and CUDA Toolkit are correctly installed and compatible with your deep learning environment. Free up GPU memory by reducing batch sizes, closing other GPU-intensive applications, or restarting your system. Ensure your GPU is properly recognized and accessible.
breaking Mismatch between `nvidia-cusparse-cu12` version and the installed CUDA Toolkit or GPU driver can lead to `ImportError: undefined symbol` errors. This commonly occurs when `torch` or `tensorflow` dependencies request a specific `nvidia-cusparse-cu12` version that doesn't align with your system's CUDA setup. [18, 20, 21, 23]
fix Ensure that the CUDA Toolkit version on your system, your GPU driver, and all `nvidia-*-cu12` Python packages (including `nvidia-cusparse-cu12`) are compatible. Check the version requirements of higher-level libraries (e.g., PyTorch, TensorFlow) and install `nvidia-cusparse-cu12` (and related `nvidia-*` packages) that match their stated CUDA compatibility.
gotcha The `nvidia-cusparse-cu12` package provides native C++ runtime libraries and does not expose a direct Python API for `cusparse` functions. Attempting to `import cusparse` directly or find Python bindings within this package will fail. [13, 15]
fix Interact with cuSPARSE functionality through higher-level Python libraries that provide Pythonic interfaces and utilize these underlying native libraries, such as CuPy's `cupy.sparse` module, or sparse tensor operations in PyTorch and TensorFlow.
gotcha For CUDA 12.4 and later, the `cusparseSpMV` routine might cause invalid memory accesses if the output vector is not 16-byte aligned. This can lead to crashes or incorrect results. [24]
fix Ensure that any device memory pointers passed to `cusparseSpMV` (or high-level library functions that wrap it) for output storage are 16-byte aligned. In C++/CUDA, `cudaMalloc` guarantees sufficient alignment. When interfacing from Python, this is usually handled by the wrapping library (e.g., CuPy), but manual `cudaMalloc` calls might require explicit alignment checks.
gotcha This library is distributed under an NVIDIA Proprietary Software License (LicenseRef-NVIDIA-Proprietary), which may have different terms and conditions compared to open-source licenses. [2]
fix Review the NVIDIA Proprietary Software License to ensure compliance with its terms for redistribution, modification, and usage in commercial or proprietary applications.
gotcha Some examples or tests bundled with `nvidia-cusparse-cu12` might require additional Python libraries, such as CuPy. If these dependencies are not installed, attempts to run these examples will result in an error indicating the missing module.
fix Ensure that all dependencies required by the specific example or test being run are installed. Follow any installation instructions provided with the examples, such as `pip install cupy-cuda12x`.
breaking The `nvidia-cusparse-cu12` package (and other `nvidia-*` packages) on PyPI.org is a placeholder. Attempting to install it directly without configuring the NVIDIA Python Package Index will result in a `RuntimeError` during the build process, indicating the package is not found or is a placeholder. The package is hosted on NVIDIA's own PyPI.
fix To install `nvidia-cusparse-cu12` and similar NVIDIA packages, you must first install the `nvidia-pyindex` package or configure pip to use `https://pypi.nvidia.com` as an extra index URL. The recommended fix is to run: `pip install nvidia-pyindex` followed by `pip install nvidia-cusparse-cu12`.
python os / libc status wheel install import disk
3.10 alpine (musl) build_error - - - -
3.10 alpine (musl) - - - -
3.10 slim (glibc) wheel 10.5s - 574M
3.10 slim (glibc) - - - -
3.11 alpine (musl) build_error - - - -
3.11 alpine (musl) - - - -
3.11 slim (glibc) wheel 9.8s - 576M
3.11 slim (glibc) - - - -
3.12 alpine (musl) build_error - - - -
3.12 alpine (musl) - - - -
3.12 slim (glibc) wheel 9.4s - 568M
3.12 slim (glibc) - - - -
3.13 alpine (musl) build_error - - - -
3.13 alpine (musl) - - - -
3.13 slim (glibc) wheel 9.0s - 568M
3.13 slim (glibc) - - - -
3.9 alpine (musl) build_error - - - -
3.9 alpine (musl) - - - -
3.9 slim (glibc) wheel 10.8s - 574M
3.9 slim (glibc) - - - -

This quickstart demonstrates how a higher-level Python library like CuPy leverages `nvidia-cusparse-cu12` for GPU-accelerated sparse matrix operations. The `nvidia-cusparse-cu12` package itself does not expose direct Python APIs. This code checks for CUDA availability and performs a basic sparse matrix-vector multiplication using CuPy's sparse module.

import os
try:
    import cupy as cp
    import cupy.sparse as cps
    import numpy as np

    # Check for CUDA device
    if cp.cuda.is_available():
        print(f"CUDA is available. CuPy version: {cp.__version__}")
        print(f"CUDA Device Name: {cp.cuda.Device().name}")

        # Create a sparse matrix on CPU (SciPy format)
        row = np.array([0, 1, 2, 0])
        col = np.array([0, 1, 2, 1])
        data = np.array([1, 2, 3, 4])
        shape = (3, 3)
        sparse_cpu = cps.csr_matrix((data, (row, col)), shape=shape)
        print("\nCPU Sparse Matrix:\n", sparse_cpu.toarray())

        # Transfer to GPU and perform a sparse matrix-vector multiplication
        sparse_gpu = cps.csr_matrix(sparse_cpu, dtype=cp.float32) # cuSPARSE generally works with float32/64
        vector_gpu = cp.array([1.0, 2.0, 3.0], dtype=cp.float32)

        result_gpu = sparse_gpu @ vector_gpu
        print("\nGPU Sparse Matrix-Vector Product (using cuSPARSE via CuPy):\n", result_gpu)

    else:
        print("CUDA is not available. Please ensure a compatible NVIDIA GPU and driver are installed.")
        print("You may need to install cupy-cuda12x manually if using a specific CUDA version.")

except ImportError:
    print("CuPy is not installed. To run this example, install CuPy compatible with CUDA 12:")
    print("pip install cupy-cuda12x")
except Exception as e:
    print(f"An error occurred: {e}")