NVIDIA cuTENSOR Python Bindings (CUDA 13)
raw JSON → 2.6.0 verified Fri May 01 auth: no python
Python bindings for NVIDIA cuTENSOR, a GPU-accelerated tensor computation library optimized for deep learning and HPC. Version 2.6.0 supports CUDA 13.x. Release cadence follows cuTENSOR releases (major.minor.patch).
pip install cutensor-cu13 Common errors
error ModuleNotFoundError: No module named 'cutensor' ↓
cause Wrong installation variant for CUDA version. cutensor-cu13 only works with CUDA 13.x.
fix
Install the correct variant: 'pip install cutensor-cu13' for CUDA 13.
error RuntimeError: cuTENSOR error: CUTENSOR_STATUS_INVALID_VALUE ↓
cause Non-contiguous tensor or unsupported data type.
fix
Ensure tensors are contiguous (use .contiguous() on tensor objects) and that dtype is float32, float64, or complex64/128.
error ImportError: libcutensor.so: cannot open shared object file ↓
cause Missing CUDA runtime libraries.
fix
Install compatible CUDA toolkit (13.x) or the corresponding nvidia-* meta-packages: pip install nvidia-cuda-runtime-cu13 nvidia-cublas-cu13.
Warnings
breaking The Python API changed in cuTENSOR 2.0+; the old 'cutensor.init' pattern was removed. ↓
fix Use 'from cutensor import tensor' directly; no explicit init call needed.
gotcha Input tensors must be contiguous in memory; non-contiguous arrays may cause silent wrong results or crashes. ↓
fix Call np.ascontiguousarray() on NumPy arrays before converting to cutensor tensors.
deprecated Using 'cutensor.TensorHandle' is deprecated in favor of 'cutensor.tensor'. ↓
fix Replace 'TensorHandle' with 'tensor' class.
Imports
- tensor wrong
import cutensorcorrectfrom cutensor import tensor
Quickstart
import numpy as np
from cutensor import tensor
# Create two tensors on GPU (requires CUDA context)
a = tensor(np.random.rand(3,4), device='cuda')
b = tensor(np.random.rand(4,5), device='cuda')
# Perform matrix multiplication
c = a @ b
print(c)