NVIDIA CUBLAS Runtime Libraries
The `nvidia-cublas` package provides the native runtime libraries for NVIDIA's CUBLAS (CUDA Basic Linear Algebra Subroutines). It acts as a foundational dependency, allowing other Python deep learning and scientific computing frameworks (like PyTorch, TensorFlow, and CuPy) to leverage GPU-accelerated linear algebra operations efficiently. It is currently at version 13.3.0.5 and typically receives updates aligned with new NVIDIA CUDA Toolkit releases.
Common errors
-
CUBLAS_STATUS_ALLOC_FAILED
cause This error occurs when the CUBLAS library is unable to allocate sufficient GPU memory for a requested operation, often due to large model sizes, large batch sizes, fragmented memory, or other GPU processes consuming resources.fixReduce batch size, optimize model size, free up GPU memory by clearing unused variables/sessions, or ensure no other memory-intensive processes are running on the GPU. -
ImportError: libcublas.so.<version>: cannot open shared object file: No such file or directory
cause This indicates that TensorFlow, PyTorch, or another framework cannot find the necessary `libcublas.so` shared library file, often due to an incorrect CUDA Toolkit installation, missing library paths in `LD_LIBRARY_PATH`, or a mismatch between the expected and installed CUDA/CUBLAS versions.fixEnsure the CUDA Toolkit is correctly installed and its library path (e.g., `/usr/local/cuda/lib64`) is added to the `LD_LIBRARY_PATH` environment variable. Verify that the installed `nvidia-cublas` version matches the CUDA Toolkit version expected by your deep learning framework. Reinstalling the correct CUDA Toolkit and aligning framework versions often resolves this. -
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`
cause This error signifies that the CUBLAS library was not properly initialized before an attempt to use its functions, potentially because the CUDA runtime failed to initialize or a valid CUDA context was not established.fixEnsure `cublasCreate()` is called successfully before any CUBLAS operations. Verify your CUDA installation is correct, GPU drivers are up-to-date, and the GPU is healthy and accessible, potentially checking environment variables like `CUDA_VISIBLE_DEVICES`. -
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm(handle)`
cause This general error indicates a failure during the execution of a CUBLAS kernel (e.g., matrix multiplication). It can stem from invalid input parameters, out-of-bounds memory access, or the GPU program failing to execute for other reasons.fixCarefully inspect input tensors for correct dimensions, valid values (no NaNs or Infs), and appropriate data types. Reduce the complexity or size of the operation if it might be hitting GPU resource limits. Setting `CUDA_LAUNCH_BLOCKING=1` can help pinpoint the exact line of code causing the error by forcing synchronous execution. -
ModuleNotFoundError: No module named 'nvidia'
cause While `nvidia-cublas` provides native libraries, this Python error occurs when a Python application or framework attempts to import a Python module under the `nvidia` namespace (e.g., `nvidia.dali`, `nvidia.cublas` directly) and the corresponding Python package is not installed or discoverable in the Python environment. The `nvidia-cublas` package itself does not expose a `nvidia` Python module for direct import.fixInstall the specific Python package that provides the missing `nvidia` submodule (e.g., `pip install nvidia-dali`, `pip install nvidia-pyindex` if other NVIDIA Python utilities are needed). Ensure your Python environment is correctly activated and the packages are installed for the Python interpreter you are using. If you are specifically trying to interact with `nvidia-cublas` through Python, ensure you are using a higher-level library like PyTorch or TensorFlow that handles the native calls.
Warnings
- gotcha The `nvidia-cublas` package does not expose a direct Python API. Its primary function is to provide the underlying native CUBLAS shared libraries that other Python libraries (e.g., PyTorch, TensorFlow, CuPy) link against to perform GPU-accelerated linear algebra operations.
- gotcha CUBLAS versions must be compatible with your installed NVIDIA GPU drivers and the CUDA Toolkit version used by your deep learning framework. Mismatches can lead to runtime errors or performance issues.
- gotcha Installing `nvidia-cublas` via pip can conflict with existing system-wide or Conda-managed CUDA installations if `LD_LIBRARY_PATH` or other environment variables are not correctly managed, potentially leading to 'DLL not found' or 'CUDA driver' errors.
Install
-
pip install nvidia-cublas
Quickstart
import torch
def check_cublas_availability():
# Ensure PyTorch is installed and CUDA is available for CUBLAS to be used
try:
if not torch.cuda.is_available():
print("CUDA is not available. CUBLAS operations will run on CPU or not at all.")
return
print(f"CUDA is available. Device name: {torch.cuda.get_device_name(0)}")
print(f"Number of CUDA devices: {torch.cuda.device_count()}")
# Perform a simple matrix multiplication that typically uses CUBLAS
a = torch.randn(1000, 1000, device='cuda')
b = torch.randn(1000, 1000, device='cuda')
c = torch.matmul(a, b)
print("Successfully performed a GPU matrix multiplication (likely using CUBLAS).")
print(f"Result shape: {c.shape}")
except Exception as e:
print(f"An error occurred during CUDA operation: {e}")
print("This might indicate an issue with CUBLAS, CUDA installation, or drivers.")
if __name__ == "__main__":
check_cublas_availability()