AutoAWQ
raw JSON → 0.2.9 verified Mon Apr 27 auth: no python deprecated
AutoAWQ implements the AWQ (Activation-aware Weight Quantization) algorithm for 4-bit quantization of large language models, achieving up to 2x speedup during inference. The library is now deprecated as of v0.2.9 (April 2025), with vLLM having adopted the technology. Last tested with Torch 2.6.0 and Transformers 4.51.3.
pip install autoawq Common errors
error ImportError: cannot import name 'AutoAWQForCausalLM' from 'awq' ↓
cause Typo: the import uses 'AutoAWQForCausalLM' but the correct symbol may be case-sensitive; also check that the library is installed with correct version.
fix
Run: pip install autoawq --upgrade and use: from awq import AutoAWQForCausalLM
error ModuleNotFoundError: No module named 'awq' ↓
cause AutoAWQ is not installed, or installed but the module name is 'autoawq' (some users mistakenly import 'autoawq' instead of 'awq').
fix
Install the package: pip install autoawq, then use: from awq import ...
Warnings
breaking AutoAWQ is officially deprecated as of v0.2.9. No further updates or bug fixes will be provided. Users are advised to migrate to vLLM, which has adopted AWQ natively. ↓
fix Migrate to vLLM (pip install vllm) and use vLLM's built-in AWQ support.
gotcha Import path confusion: Some online examples show 'from auto_gptq import ...' but AutoAWQ is a separate library. Do not confuse with GPTQ (auto_gptq). ↓
fix Use 'from awq import AutoAWQForCausalLM' (note the lowercase 'awq').
gotcha Transformers compatibility is fragile. AutoAWQ v0.2.9 was last tested with Transformers 4.51.3. Using newer versions may cause silent inference errors or import failures. ↓
fix Pin transformers to <=4.51.3, or upgrade to vLLM which tracks latest transformers versions.
Install
pip install autoawq[extras] Imports
- AutoAWQForCausalLM
from awq import AutoAWQForCausalLM - AutoAWQConfig wrong
from awq.utils import AutoAWQConfigcorrectfrom awq import AutoAWQConfig
Quickstart
from awq import AutoAWQForCausalLM, AutoAWQConfig
from transformers import AutoTokenizer
model_path = 'casperhansen/mixtral-instruct-awq'
quant_config = AutoAWQConfig(bits=4, group_size=128, zero_point=True)
model = AutoAWQForCausalLM.from_pretrained(model_path, config=quant_config, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained(model_path)
inputs = tokenizer("Hello, how are you?", return_tensors='pt')
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))