AutoAWQ

0.2.9 verified Mon Apr 27 auth: no python deprecated

AutoAWQ implements the AWQ (Activation-aware Weight Quantization) algorithm for 4-bit quantization of large language models, achieving up to 2x speedup during inference. The library is now deprecated as of v0.2.9 (April 2025), with vLLM having adopted the technology. Last tested with Torch 2.6.0 and Transformers 4.51.3.

pip install autoawq

Common errors

error ImportError: cannot import name 'AutoAWQForCausalLM' from 'awq' ↓

cause Typo: the import uses 'AutoAWQForCausalLM' but the correct symbol may be case-sensitive; also check that the library is installed with correct version.

fix

Run: pip install autoawq --upgrade and use: from awq import AutoAWQForCausalLM

error ModuleNotFoundError: No module named 'awq' ↓

cause AutoAWQ is not installed, or installed but the module name is 'autoawq' (some users mistakenly import 'autoawq' instead of 'awq').

fix

Install the package: pip install autoawq, then use: from awq import ...

Warnings

breaking AutoAWQ is officially deprecated as of v0.2.9. No further updates or bug fixes will be provided. Users are advised to migrate to vLLM, which has adopted AWQ natively. ↓

fix Migrate to vLLM (pip install vllm) and use vLLM's built-in AWQ support.

gotcha Import path confusion: Some online examples show 'from auto_gptq import ...' but AutoAWQ is a separate library. Do not confuse with GPTQ (auto_gptq). ↓

fix Use 'from awq import AutoAWQForCausalLM' (note the lowercase 'awq').

gotcha Transformers compatibility is fragile. AutoAWQ v0.2.9 was last tested with Transformers 4.51.3. Using newer versions may cause silent inference errors or import failures. ↓

fix Pin transformers to <=4.51.3, or upgrade to vLLM which tracks latest transformers versions.

Install

pip install autoawq[extras]

Imports

AutoAWQForCausalLM
```
from awq import AutoAWQForCausalLM
```
AutoAWQConfig
wrong
```
from awq.utils import AutoAWQConfig
```
correct
```
from awq import AutoAWQConfig
```
AutoAWQConfig is directly importable from awq, not from awq.utils.

Quickstart

Load a pre-quantized AWQ model and generate text.

from awq import AutoAWQForCausalLM, AutoAWQConfig
from transformers import AutoTokenizer

model_path = 'casperhansen/mixtral-instruct-awq'
quant_config = AutoAWQConfig(bits=4, group_size=128, zero_point=True)
model = AutoAWQForCausalLM.from_pretrained(model_path, config=quant_config, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained(model_path)
inputs = tokenizer("Hello, how are you?", return_tensors='pt')
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))