{"id":2618,"library":"nvidia-cublas-cu11","title":"NVIDIA CUBLAS Runtime Libraries for Python (cu11)","description":"The `nvidia-cublas-cu11` package provides the native CUBLAS runtime libraries for NVIDIA GPUs, specifically for CUDA 11 environments. CUBLAS is NVIDIA's highly optimized implementation of BLAS (Basic Linear Algebra Subprograms) which is critical for accelerating AI and HPC workloads. This package allows Python environments to access GPU computational resources for linear algebra operations, typically as a dependency for higher-level frameworks like PyTorch, TensorFlow, or through wrappers like Numba. The current version is 11.11.3.6, with releases generally aligned with CUDA Toolkit updates and subsequent patch releases.","status":"active","version":"11.11.3.6","language":"en","source_language":"en","source_url":"https://developer.nvidia.com/cublas","tags":["cuda","nvidia","runtime","machine learning","deep learning","blas","linear algebra","gpu"],"install":[{"cmd":"pip install nvidia-cublas-cu11","lang":"bash","label":"Install via pip"}],"dependencies":[{"reason":"CUBLAS is part of the CUDA Toolkit; compatibility with the installed CUDA version is crucial.","package":"CUDA Toolkit","optional":false},{"reason":"Requires compatible NVIDIA GPU drivers for proper functionality.","package":"NVIDIA GPU Drivers","optional":false},{"reason":"Commonly used Python library to expose CUDA/CUBLAS functionality in Python.","package":"numba","optional":true}],"imports":[{"note":"The `nvidia-cublas-cu11` package provides the underlying C/C++ runtime libraries, not a direct Python module for 'import cublas'. Python users typically interact with CUBLAS functionality through higher-level libraries like Numba, PyTorch, or TensorFlow which link against these native libraries.","wrong":"import cublas","symbol":"cuBLAS functionality","correct":"import numba.cuda; from numba import float32, float64; # Indirect access via other libraries"}],"quickstart":{"code":"import numpy as np\nfrom numba import cuda\nimport math\n\n@cuda.jit\ndef matmul(A, B, C):\n    # Perform matrix multiplication of C = A * B\n    row, col = cuda.grid(2)\n    if row < C.shape[0] and col < C.shape[1]:\n        tmp = 0.\n        for k in range(A.shape[1]):\n            tmp += A[row, k] * B[k, col]\n        C[row, col] = tmp\n\n# Example usage\nN = 256\nA_host = np.random.rand(N, N).astype(np.float32)\nB_host = np.random.rand(N, N).astype(np.float32)\nC_host = np.zeros((N, N), dtype=np.float32)\n\n# Allocate device memory\nA_device = cuda.to_device(A_host)\nB_device = cuda.to_device(B_host)\nC_device = cuda.to_device(C_host)\n\n# Configure the blocks and threads\nthreads_per_block = (16, 16)\nblocks_per_grid_x = int(math.ceil(A_host.shape[0] / threads_per_block[0]))\nblocks_per_grid_y = int(math.ceil(B_host.shape[1] / threads_per_block[1]))\nblocks_per_grid = (blocks_per_grid_x, blocks_per_grid_y)\n\n# Launch the kernel\nmatmul[blocks_per_grid, threads_per_block](A_device, B_device, C_device)\n\n# Copy the result back to the host\nC_result = C_device.copy_to_host()\n\nprint('Matrix multiplication completed on GPU (via Numba wrapping CUBLAS).')\n# For verification (optional, requires higher-level libraries to implicitly use CUBLAS or direct numpy CPU op)\n# C_numpy = np.dot(A_host, B_host)\n# print(f\"Max absolute difference: {np.max(np.abs(C_result - C_numpy))}\") # Should be very small\n","lang":"python","description":"This quickstart demonstrates how to utilize GPU-accelerated linear algebra through Numba, which in turn leverages the underlying CUBLAS libraries provided by `nvidia-cublas-cu11`. It performs a basic matrix multiplication, highlighting the necessary steps for device memory allocation, kernel execution, and result retrieval. Ensure you have Numba installed (`pip install numba`) and a compatible NVIDIA GPU and CUDA Toolkit. This example uses a custom kernel but Numba can also directly call CUBLAS functions for some operations."},"warnings":[{"fix":"Always ensure the installed `nvidia-cublas-cu11` package version (indicated by `cu11` in the name for CUDA 11 compatibility) matches your system's CUDA Toolkit version. Consult NVIDIA's documentation for compatibility matrices.","message":"Mismatching `nvidia-cublas-cu11` versions with your installed NVIDIA CUDA Toolkit can lead to runtime errors, undefined behavior, or application crashes. CUBLAS versions are tightly coupled with CUDA versions.","severity":"breaking","affected_versions":"All versions"},{"fix":"Utilize frameworks like Numba (`from numba import cuda`), PyTorch (`import torch`), or TensorFlow (`import tensorflow`) to leverage GPU acceleration. These libraries handle the low-level interactions with CUBLAS.","message":"This package primarily provides native shared libraries (`.so`, `.dll`) and is not intended for direct Python import or interaction. Users typically access CUBLAS functionality indirectly via higher-level Python libraries like Numba, PyTorch, or TensorFlow, which bind to these underlying C/C++ libraries. Attempting `import cublas` will fail.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Verify that your `LD_LIBRARY_PATH` (Linux) or system PATH (Windows) includes the `lib64` (Linux) or `bin` (Windows) directory of your CUDA Toolkit installation (e.g., `/usr/local/cuda/lib64`). Tools like `ldd` (Linux) can help diagnose linking issues.","message":"Applications relying on CUBLAS require proper environment variable configuration, especially `LD_LIBRARY_PATH` (on Linux) or system PATH (on Windows), to include the directory containing `libcublas.so` (or `cublas.dll`). Incorrect paths can lead to 'library not found' errors.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Monitor GPU memory usage with `nvidia-smi`. Optimize your workload sizes or use smaller batches. Review CUBLAS function parameters for correctness, as invalid inputs can trigger this error. Sometimes, reinstalling `nvidia-cublas-cu11` can resolve perceived library mismatches.","message":"Memory allocation errors or `CUBLAS_STATUS_INVALID_VALUE` are common when GPU memory is insufficient or if kernel parameters are incorrect. Ensure your GPU has enough memory for the operation.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-10T00:00:00.000Z","next_check":"2026-07-09T00:00:00.000Z"}