cuMM: CUda Matrix Multiply Library
raw JSON → 0.8.2 verified Fri May 01 auth: no python
cuMM is a high-performance CUDA matrix multiplication library designed for deep learning and scientific computing. It provides optimized GEMM (General Matrix Multiply) kernels and supports various precision formats. Version 0.8.2 requires Python >=3.8 and is actively maintained.
pip install cumm-cu126 Common errors
error ModuleNotFoundError: No module named 'cumm' ↓
cause Installed package 'cumm-cu126' but Python cannot find the module due to missing dependencies or incorrect import. Also, the module name is exactly 'cumm' (no hyphen).
fix
Run 'pip install cumm-cu126' and ensure you use 'import cumm' (no hyphen). Check that CUDA toolkit 12.6 is available.
error RuntimeError: CUDA error: no kernel image is available for execution on the device ↓
cause The GPU architecture is not supported by the precompiled kernels in cuMM. cuMM ships kernels for specific compute capabilities (e.g., sm_80, sm_86, sm_89, sm_90). Older or newer GPUs may not have a matching kernel.
fix
Use a supported GPU (e.g., NVIDIA Ampere, Ada Lovelace, Hopper) or rebuild cuMM from source with the appropriate architecture flags.
error ImportError: libcudart.so.12: cannot open shared object file: No such file or directory ↓
cause CUDA runtime library (libcudart.so.12) is not installed or not in the library path.
fix
Install CUDA 12.6 toolkit and add its lib64 directory to LD_LIBRARY_PATH.
Warnings
breaking cuMM requires a compatible CUDA toolkit (CUDA 12.6) and NVIDIA GPU drivers. Running on an unsupported CUDA version may cause import errors or runtime crashes. ↓
fix Ensure your system has CUDA 12.6 installed and set LD_LIBRARY_PATH appropriately.
gotcha The library name on PyPI is 'cumm-cu126', but the Python module to import is simply 'cumm'. Do not use the PyPI name in import statements. ↓
fix Use 'import cumm' instead of 'import cumm-cu126'.
deprecated cuMM versions before 0.7.0 used a different API with explicit gemm_ functions. The new API uses cumm.gemm directly. ↓
fix Upgrade to 0.8.2 and replace cumm.gemm_xx with cumm.gemm.
Imports
- cumm wrong
import cumm-cu126correctimport cumm - cumm.functional wrong
from cumm_cu126 import functionalcorrectfrom cumm import functional
Quickstart
import cumm
import torch
x = torch.randn(128, 128, device='cuda')
y = torch.randn(128, 128, device='cuda')
z = cumm.gemm(x, y)
print(z.shape)