NVIDIA cuDNN Frontend

1.22.1 · active · verified Fri Apr 10

The `nvidia-cudnn-frontend` is a Python library that provides a high-level, user-friendly API to interact with the cuDNN deep learning library backend. It facilitates the creation and execution of optimized tensor operations, including various fusions and custom kernels, specifically designed for NVIDIA GPUs. It is currently at version 1.22.1 and maintains an active release cadence, often aligning with new cuDNN backend releases.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize cuDNN frontend, define input tensors, create a convolution operation within a graph, build and execute the graph, and retrieve the output using PyTorch tensors on a CUDA-enabled GPU.

import cudnn_frontend
import torch

# Ensure CUDA device is available
if not torch.cuda.is_available():
    raise RuntimeError("CUDA is not available. This library requires a CUDA-enabled GPU.")

# Example: Create and execute a simple convolution graph
# Define input and weight tensors on CUDA
x = torch.randn(1, 1, 28, 28, device="cuda", dtype=torch.float32)
w = torch.randn(16, 1, 3, 3, device="cuda", dtype=torch.float32)

# Create a cuDNN frontend graph
graph = cudnn_frontend.create_graph({"fp8_mode": False}) # fp8_mode can be set to True for FP8 operations

# Make input tensors for the graph from PyTorch tensors
X = graph.make_input_tensor(
    "X", cudnn_frontend.DataType.FLOAT, x.shape, x.stride()
)
W = graph.make_input_tensor(
    "W", cudnn_frontend.DataType.FLOAT, w.shape, w.stride()
)

# Define a convolution operation
Y = graph.make_convolution(
    X, W, padding=[1, 1], stride=[1, 1], dilation=[1, 1]
)
# Mark the output tensor
Y.set_output()

# Build the operation graph, create execution plans, and check support
graph.build_operation_graph()
graph.create_execution_plans(
    [cudnn_frontend.heur_mode.A, cudnn_frontend.heur_mode.FALLBACK]
)
graph.check_support()
graph.build_plans()

# Allocate the output tensor on CUDA
y_out = torch.empty(Y.get_output_tensors()[0].get_dim(), device="cuda", dtype=torch.float32)

# Execute the graph
graph.execute([x, w], [y_out])

print("Graph execution successful!")
print(f"Output tensor shape: {y_out.shape}")

view raw JSON →