NVIDIA CUTLASS Python DSL Base Libraries

library 4.4.2 ·python

✓ verified Jun 28, 2026

NVIDIA CUTLASS Python DSL (`nvidia-cutlass-dsl-libs-base`) provides a Pythonic interface for writing high-performance CUDA kernels using CUTLASS's CuTe library and tensor abstractions. It enables kernel development with automatic compilation to optimized PTX/SASS, offering performance comparable to hand-written CUDA C++ while enhancing developer productivity. Currently at version 4.4.2, the library is actively developed with frequent releases, often tied to new CUDA Toolkit versions and NVIDIA GPU architectures.