NVIDIA CUTLASS Python DSL

library 4.4.2 ·python

✓ verified Jun 28, 2026

NVIDIA CUTLASS Python DSL (version 4.4.2) is a Python-based domain-specific language (DSL) for writing high-performance CUDA kernels. It provides a Pythonic interface to CUTLASS's CuTe library, enabling kernel development with automatic JIT compilation to optimized PTX/SASS for NVIDIA GPUs (Ampere, Hopper, Blackwell architectures). It aims for zero-cost abstraction, performance comparable to C++ kernels, and seamless integration with deep learning frameworks like PyTorch and JAX. The library maintains an active development pace with frequent updates and minor version releases.

Traffic · last 30 days ↑300% vs prev 7d · indexed Sat Apr 11 · updated Sat Jul 11

total hits 12

actors 5 distinct systems

last hit 2d ago ByteDance

ByteDance

ChatGPT-User

Search engines

top countries 🇺🇸 United States · 🇨🇦 Canada · 🇬🇧 United Kingdom · 🇸🇬 Singapore

Resources

docsgithub.com/NVIDIA/cutlass ↗

githubgithub.com/NVIDIA/cutlass ↗

packagepypi.org/project/nvidia-cutlass-dsl/ ↗

API endpoints

full doc /v1/registry/nvidia-cutlass-dsl

install /v1/registry/nvidia-cutlass-dsl/install

compatibility /v1/registry/nvidia-cutlass-dsl/compatibility