cuMM: CUda Matrix Multiply Library

0.8.2 verified Fri May 01 auth: no python

cuMM is a high-performance CUDA matrix multiplication library designed for deep learning and scientific computing. It provides optimized GEMM (General Matrix Multiply) kernels and supports various precision formats. Version 0.8.2 requires Python >=3.8 and is actively maintained.

pip install cumm-cu126

Common errors

error ModuleNotFoundError: No module named 'cumm' ↓

cause Installed package 'cumm-cu126' but Python cannot find the module due to missing dependencies or incorrect import. Also, the module name is exactly 'cumm' (no hyphen).

fix

Run 'pip install cumm-cu126' and ensure you use 'import cumm' (no hyphen). Check that CUDA toolkit 12.6 is available.

error RuntimeError: CUDA error: no kernel image is available for execution on the device ↓

cause The GPU architecture is not supported by the precompiled kernels in cuMM. cuMM ships kernels for specific compute capabilities (e.g., sm_80, sm_86, sm_89, sm_90). Older or newer GPUs may not have a matching kernel.

fix

Use a supported GPU (e.g., NVIDIA Ampere, Ada Lovelace, Hopper) or rebuild cuMM from source with the appropriate architecture flags.

error ImportError: libcudart.so.12: cannot open shared object file: No such file or directory ↓

cause CUDA runtime library (libcudart.so.12) is not installed or not in the library path.

fix

Install CUDA 12.6 toolkit and add its lib64 directory to LD_LIBRARY_PATH.

Warnings

breaking cuMM requires a compatible CUDA toolkit (CUDA 12.6) and NVIDIA GPU drivers. Running on an unsupported CUDA version may cause import errors or runtime crashes. ↓

fix Ensure your system has CUDA 12.6 installed and set LD_LIBRARY_PATH appropriately.

gotcha The library name on PyPI is 'cumm-cu126', but the Python module to import is simply 'cumm'. Do not use the PyPI name in import statements. ↓

fix Use 'import cumm' instead of 'import cumm-cu126'.

deprecated cuMM versions before 0.7.0 used a different API with explicit gemm_ functions. The new API uses cumm.gemm directly. ↓

fix Upgrade to 0.8.2 and replace cumm.gemm_xx with cumm.gemm.

Imports

cumm
wrong
```
import cumm-cu126
```
correct
```
import cumm
```
cumm-cu126 is the package name on PyPI, but the import module is 'cumm'.
cumm.functional
wrong
```
from cumm_cu126 import functional
```
correct
```
from cumm import functional
```
The module name does not include the CUDA version suffix.

Quickstart

Basic GEMM operation using cuMM with PyTorch tensors.

import cumm
import torch
x = torch.randn(128, 128, device='cuda')
y = torch.randn(128, 128, device='cuda')
z = cumm.gemm(x, y)
print(z.shape)