PyTorch AO (torchao)

0.17.0 · active · verified Thu Apr 09

TorchAO is a PyTorch library for applying advanced optimization (AO) techniques, primarily quantization and sparsity, to deep learning models running on GPUs. It focuses on performance acceleration through low-precision kernels, mixture-of-experts (MoE) optimizations, and quantization-aware training (QAT). The current version is 0.17.0, with new versions and significant features released frequently, often monthly.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to define a simple PyTorch model and apply a predefined post-training quantization recipe using `torchao.quantization.quantize_`.

import torch
import torch.nn as nn
from torchao.quantization import quantize_, int8_dynamic_activation_int4_weight

# 1. Define a simple model
class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear1 = nn.Linear(10, 20)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(20, 5)

    def forward(self, x):
        return self.linear2(self.relu(self.linear1(x)))

model = MyModel()
print(f"Original model: {model}")

# 2. Define a quantization recipe
# This uses a predefined post-training quantization recipe
quantizer = int8_dynamic_activation_int4_weight()

# 3. Apply quantization to the model
# quantize_ modifies the model in-place (or returns a modified copy)
quantized_model = quantize_(model, quantizer)

print(f"\nQuantized model: {quantized_model}")

# Test with some dummy input
dummy_input = torch.randn(1, 10)
output = quantized_model(dummy_input)
print(f"\nOutput shape: {output.shape}")
assert isinstance(quantized_model.linear1, torch.nn.Module) # Verify structure

view raw JSON →