{"library":"nvidia-cublas-cu11","title":"NVIDIA CUBLAS Runtime Libraries for Python (cu11)","description":"The `nvidia-cublas-cu11` package provides the native CUBLAS runtime libraries for NVIDIA GPUs, specifically for CUDA 11 environments. CUBLAS is NVIDIA's highly optimized implementation of BLAS (Basic Linear Algebra Subprograms) which is critical for accelerating AI and HPC workloads. This package allows Python environments to access GPU computational resources for linear algebra operations, typically as a dependency for higher-level frameworks like PyTorch, TensorFlow, or through wrappers like Numba. The current version is 11.11.3.6, with releases generally aligned with CUDA Toolkit updates and subsequent patch releases.","language":"python","status":"active","last_verified":"Thu May 14","install":{"commands":["pip install nvidia-cublas-cu11"],"cli":null},"imports":["import numba.cuda; from numba import float32, float64; # Indirect access via other libraries"],"auth":{"required":false,"env_vars":[]},"quickstart":{"code":"import numpy as np\nfrom numba import cuda\nimport math\n\n@cuda.jit\ndef matmul(A, B, C):\n    # Perform matrix multiplication of C = A * B\n    row, col = cuda.grid(2)\n    if row < C.shape[0] and col < C.shape[1]:\n        tmp = 0.\n        for k in range(A.shape[1]):\n            tmp += A[row, k] * B[k, col]\n        C[row, col] = tmp\n\n# Example usage\nN = 256\nA_host = np.random.rand(N, N).astype(np.float32)\nB_host = np.random.rand(N, N).astype(np.float32)\nC_host = np.zeros((N, N), dtype=np.float32)\n\n# Allocate device memory\nA_device = cuda.to_device(A_host)\nB_device = cuda.to_device(B_host)\nC_device = cuda.to_device(C_host)\n\n# Configure the blocks and threads\nthreads_per_block = (16, 16)\nblocks_per_grid_x = int(math.ceil(A_host.shape[0] / threads_per_block[0]))\nblocks_per_grid_y = int(math.ceil(B_host.shape[1] / threads_per_block[1]))\nblocks_per_grid = (blocks_per_grid_x, blocks_per_grid_y)\n\n# Launch the kernel\nmatmul[blocks_per_grid, threads_per_block](A_device, B_device, C_device)\n\n# Copy the result back to the host\nC_result = C_device.copy_to_host()\n\nprint('Matrix multiplication completed on GPU (via Numba wrapping CUBLAS).')\n# For verification (optional, requires higher-level libraries to implicitly use CUBLAS or direct numpy CPU op)\n# C_numpy = np.dot(A_host, B_host)\n# print(f\"Max absolute difference: {np.max(np.abs(C_result - C_numpy))}\") # Should be very small\n","lang":"python","description":"This quickstart demonstrates how to utilize GPU-accelerated linear algebra through Numba, which in turn leverages the underlying CUBLAS libraries provided by `nvidia-cublas-cu11`. It performs a basic matrix multiplication, highlighting the necessary steps for device memory allocation, kernel execution, and result retrieval. Ensure you have Numba installed (`pip install numba`) and a compatible NVIDIA GPU and CUDA Toolkit. This example uses a custom kernel but Numba can also directly call CUBLAS functions for some operations.","tag":null,"tag_description":null,"last_tested":"2026-04-25","results":[{"runtime":"python:3.10-alpine","exit_code":1},{"runtime":"python:3.10-slim","exit_code":-1},{"runtime":"python:3.11-alpine","exit_code":1},{"runtime":"python:3.11-slim","exit_code":-1},{"runtime":"python:3.12-alpine","exit_code":1},{"runtime":"python:3.12-slim","exit_code":-1},{"runtime":"python:3.13-alpine","exit_code":1},{"runtime":"python:3.13-slim","exit_code":-1},{"runtime":"python:3.9-alpine","exit_code":1},{"runtime":"python:3.9-slim","exit_code":-1}]},"compatibility":{"tag":null,"tag_description":null,"last_tested":"2026-05-14","installed_version":"11.11.3.6","pypi_latest":"11.11.3.6","is_stale":false,"summary":{"python_range":"3.10–3.9","success_rate":25,"avg_install_s":9.6,"avg_import_s":null,"wheel_type":"wheel"},"results":[{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"nvidia-cublas-cu11","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"nvidia-cublas-cu11","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"nvidia-cublas-cu11","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":9.9,"import_time_s":null,"mem_mb":null,"disk_size":"658M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"nvidia-cublas-cu11","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"nvidia-cublas-cu11","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"nvidia-cublas-cu11","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"nvidia-cublas-cu11","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":8.8,"import_time_s":null,"mem_mb":null,"disk_size":"660M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"nvidia-cublas-cu11","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"nvidia-cublas-cu11","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"nvidia-cublas-cu11","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"nvidia-cublas-cu11","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":10.1,"import_time_s":null,"mem_mb":null,"disk_size":"652M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"nvidia-cublas-cu11","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"nvidia-cublas-cu11","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"nvidia-cublas-cu11","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"nvidia-cublas-cu11","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":8.4,"import_time_s":null,"mem_mb":null,"disk_size":"651M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"nvidia-cublas-cu11","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"nvidia-cublas-cu11","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"nvidia-cublas-cu11","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"nvidia-cublas-cu11","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":10.6,"import_time_s":null,"mem_mb":null,"disk_size":"657M"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"nvidia-cublas-cu11","exit_code":1,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null}]}}