{"id":2315,"library":"torchao","title":"PyTorch AO (torchao)","description":"TorchAO is a PyTorch library for applying advanced optimization (AO) techniques, primarily quantization and sparsity, to deep learning models running on GPUs. It focuses on performance acceleration through low-precision kernels, mixture-of-experts (MoE) optimizations, and quantization-aware training (QAT). The current version is 0.17.0, with new versions and significant features released frequently, often monthly.","status":"active","version":"0.17.0","language":"en","source_language":"en","source_url":"https://github.com/pytorch/ao","tags":["pytorch","quantization","optimization","deep-learning","gpu","low-precision"],"install":[{"cmd":"pip install torchao","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Core PyTorch dependency for model definition and operations.","package":"torch","optional":false}],"imports":[{"note":"The primary quantization function has a trailing underscore.","wrong":"from torchao.quantization import quantize # Missing underscore","symbol":"quantize_","correct":"from torchao.quantization import quantize_"},{"symbol":"QuantConfig","correct":"from torchao.quantization.quant_config import QuantConfig"},{"note":"Example of a predefined quantization recipe.","symbol":"int8_dynamic_activation_int4_weight","correct":"from torchao.quantization import int8_dynamic_activation_int4_weight"},{"symbol":"apply_sparse_weights","correct":"from torchao.sparsity import apply_sparse_weights"}],"quickstart":{"code":"import torch\nimport torch.nn as nn\nfrom torchao.quantization import quantize_, int8_dynamic_activation_int4_weight\n\n# 1. Define a simple model\nclass MyModel(nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.linear1 = nn.Linear(10, 20)\n        self.relu = nn.ReLU()\n        self.linear2 = nn.Linear(20, 5)\n\n    def forward(self, x):\n        return self.linear2(self.relu(self.linear1(x)))\n\nmodel = MyModel()\nprint(f\"Original model: {model}\")\n\n# 2. Define a quantization recipe\n# This uses a predefined post-training quantization recipe\nquantizer = int8_dynamic_activation_int4_weight()\n\n# 3. Apply quantization to the model\n# quantize_ modifies the model in-place (or returns a modified copy)\nquantized_model = quantize_(model, quantizer)\n\nprint(f\"\\nQuantized model: {quantized_model}\")\n\n# Test with some dummy input\ndummy_input = torch.randn(1, 10)\noutput = quantized_model(dummy_input)\nprint(f\"\\nOutput shape: {output.shape}\")\nassert isinstance(quantized_model.linear1, torch.nn.Module) # Verify structure","lang":"python","description":"This quickstart demonstrates how to define a simple PyTorch model and apply a predefined post-training quantization recipe using `torchao.quantization.quantize_`."},"warnings":[{"fix":"Consult the `torchao` documentation for versions 0.9.0+ for the new `quantize_` API, which now typically requires a `Quantizer` object (e.g., from a predefined recipe or a custom `QuantConfig`).","message":"The `quantize_` API underwent a significant overhaul in version 0.9.0, changing how quantization recipes are applied to models. Direct calls to `quantize_` with previous argument patterns will fail.","severity":"breaking","affected_versions":"<0.9.0"},{"fix":"Refer to the latest `torchao` documentation (v0.16.0+) for current recommended configurations and quantization options. Update your code to use the officially supported APIs.","message":"Older configurations and less-used quantization options have been deprecated to streamline the library. Using these deprecated features may lead to warnings or errors in future releases.","severity":"deprecated","affected_versions":"<0.16.0"},{"fix":"Avoid using `torchao.prototype` modules in production-critical code. If used, be prepared for frequent updates and potential breaking changes with new `torchao` releases.","message":"Features located in `torchao.prototype` modules are experimental and subject to frequent, unannounced API changes, or may be removed entirely without prior deprecation. They are not considered stable.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Before deployment, verify your CUDA version and GPU architecture against `torchao`'s requirements for the specific features you intend to use. Ensure your environment matches the recommendations for optimal results.","message":"Optimal performance for `torchao`'s advanced kernels (e.g., MXFP8 MoE, W4A8) often requires specific CUDA versions (e.g., CUDA 12.8+) or particular GPU architectures (e.g., Blackwell, GB200). Using non-supported environments may result in reduced performance, errors, or inability to leverage certain features.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}