NVIDIA cuTENSOR Python Bindings (CUDA 13)

raw JSON →
2.6.0 verified Fri May 01 auth: no python

Python bindings for NVIDIA cuTENSOR, a GPU-accelerated tensor computation library optimized for deep learning and HPC. Version 2.6.0 supports CUDA 13.x. Release cadence follows cuTENSOR releases (major.minor.patch).

pip install cutensor-cu13
error ModuleNotFoundError: No module named 'cutensor'
cause Wrong installation variant for CUDA version. cutensor-cu13 only works with CUDA 13.x.
fix
Install the correct variant: 'pip install cutensor-cu13' for CUDA 13.
error RuntimeError: cuTENSOR error: CUTENSOR_STATUS_INVALID_VALUE
cause Non-contiguous tensor or unsupported data type.
fix
Ensure tensors are contiguous (use .contiguous() on tensor objects) and that dtype is float32, float64, or complex64/128.
error ImportError: libcutensor.so: cannot open shared object file
cause Missing CUDA runtime libraries.
fix
Install compatible CUDA toolkit (13.x) or the corresponding nvidia-* meta-packages: pip install nvidia-cuda-runtime-cu13 nvidia-cublas-cu13.
breaking The Python API changed in cuTENSOR 2.0+; the old 'cutensor.init' pattern was removed.
fix Use 'from cutensor import tensor' directly; no explicit init call needed.
gotcha Input tensors must be contiguous in memory; non-contiguous arrays may cause silent wrong results or crashes.
fix Call np.ascontiguousarray() on NumPy arrays before converting to cutensor tensors.
deprecated Using 'cutensor.TensorHandle' is deprecated in favor of 'cutensor.tensor'.
fix Replace 'TensorHandle' with 'tensor' class.

Basic tensor creation and matrix multiplication on GPU.

import numpy as np
from cutensor import tensor

# Create two tensors on GPU (requires CUDA context)
a = tensor(np.random.rand(3,4), device='cuda')
b = tensor(np.random.rand(4,5), device='cuda')
# Perform matrix multiplication
c = a @ b
print(c)