{"id":8937,"library":"diffq","title":"DiffQ: Differentiable Quantization Framework for PyTorch","description":"DiffQ is a differentiable quantization framework for PyTorch that provides tools to quantize PyTorch models, primarily focusing on large language models (LLMs) and computer vision models. It enables quantization-aware training and leverages various quantization methods like GPTQ, HQQ, and AWQ. Currently at version 0.2.4, it has seen active development, especially in late 2023, with periodic releases addressing new features and bug fixes.","status":"active","version":"0.2.4","language":"en","source_language":"en","source_url":"https://github.com/facebookresearch/diffq","tags":["pytorch","quantization","differentiable","machine-learning","deep-learning","llm","compression"],"install":[{"cmd":"pip install diffq","lang":"bash","label":"Install base DiffQ"},{"cmd":"pip install diffq[hqq_ext] # for HQQ support\npip install diffq[awq_ext] # for AWQ support\npip install bitsandbytes # for 8-bit quantization","lang":"bash","label":"Install optional dependencies for specific quantizers"}],"dependencies":[{"reason":"Core dependency for low-level quantization operations.","package":"pytorch_quantization","optional":false},{"reason":"Underlying deep learning framework. Version compatibility is crucial.","package":"torch","optional":false},{"reason":"Commonly used for integrating with pre-trained models, especially LLMs, in many examples.","package":"transformers","optional":true},{"reason":"Required for 8-bit quantization methods, especially with Hugging Face Transformers.","package":"bitsandbytes","optional":true}],"imports":[{"symbol":"DiffQModel","correct":"from diffq import DiffQModel"},{"symbol":"BaseQuantizationConfig","correct":"from diffq import BaseQuantizationConfig"},{"symbol":"GPTQQuantizer","correct":"from diffq.quantizers import GPTQQuantizer"},{"note":"Model architectures provided by DiffQ are typically under `diffq.models`.","wrong":"from diffq import ViTForImageClassification","symbol":"ViTForImageClassification","correct":"from diffq.models import ViTForImageClassification"}],"quickstart":{"code":"import torch\nimport torch.nn as nn\nfrom diffq import DiffQModel, BaseQuantizationConfig\n\n# 1. Define a simple PyTorch model\nclass SimpleModel(nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.linear1 = nn.Linear(10, 20)\n        self.relu = nn.ReLU()\n        self.linear2 = nn.Linear(20, 1)\n\n    def forward(self, x):\n        return self.linear2(self.relu(self.linear1(x)))\n\nmodel = SimpleModel()\n\n# 2. Define a basic quantization configuration\n# For actual quantization (e.g., 'gptq', 'hqq', 'awq'),\n# additional steps with a specific quantizer (e.g., GPTQQuantizer)\n# and a dataloader would be required.\nquant_config = BaseQuantizationConfig(\n    quant_method=\"none\", # Use \"gptq\", \"hqq\", \"awq\" for actual methods\n    w_bits=8,\n    w_group_size=128, # Not strictly applicable for \"none\", but often part of config\n    w_sym=False,\n    w_mse_scheme=\"per_tensor\"\n)\n\n# 3. Convert the PyTorch model into a DiffQModel\n# This automatically replaces modules with their quantized counterparts based on config.\ndiffq_model = DiffQModel(model, quantization_config=quant_config)\n\n# Print the model structure to see the converted modules\nprint(\"Original model:\")\nprint(model)\nprint(\"\\nDiffQModel (converted structure):\")\nprint(diffq_model)\n\n# Example forward pass (will not perform actual quantization during inference\n# without a preceding quantizer.quantize() call for methods like GPTQ)\ndummy_input = torch.randn(1, 10)\noutput = diffq_model(dummy_input)\nprint(f\"\\nOutput shape: {output.shape}\")","lang":"python","description":"This quickstart demonstrates how to convert a standard PyTorch model into a `DiffQModel` using a `BaseQuantizationConfig`. While this example uses `quant_method=\"none\"` for structural conversion, for actual quantization (e.g., 4-bit, 8-bit), you would specify a method like `\"gptq\"` and then run a specific quantizer (e.g., `GPTQQuantizer`) with a dataloader."},"warnings":[{"fix":"Verify your PyTorch, CUDA toolkit, and GPU driver versions against the `pytorch_quantization` and `diffq` requirements (usually found in their GitHub repositories' `setup.py` or documentation). Reinstalling PyTorch with the correct CUDA version is often necessary.","message":"Quantization libraries like `diffq` and its dependency `pytorch_quantization` are highly sensitive to PyTorch and CUDA version compatibility. Mismatches can lead to cryptic `RuntimeError`s, `cuDNN` errors, or unexpected behavior.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Install the specific optional dependencies required for your desired quantization method. For example, `pip install diffq[hqq_ext]` for HQQ, or `pip install bitsandbytes` for 8-bit quantization.","message":"Many advanced quantization methods (e.g., HQQ, AWQ, 8-bit quantization via `bitsandbytes`) require additional, often hardware-specific, optional dependencies. Forgetting to install these will result in `ModuleNotFoundError` or `ImportError` when attempting to use the corresponding methods.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Review `diffq` documentation on supported module types and recommended practices for custom models. Start with simpler quantization schemes and gradually increase complexity. Be prepared to implement custom `QuantizedModule`s if necessary.","message":"While `diffq` supports general PyTorch models, its primary examples and optimizations are often for large language models (LLMs) from the `transformers` library. Applying aggressive quantization to arbitrary custom models or models with complex, non-standard layers may require manual adaptations or might not yield optimal results.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Install `bitsandbytes` with `pip install bitsandbytes`. Ensure your CUDA environment and GPU drivers are compatible, as `bitsandbytes` is often CUDA-specific.","cause":"You are attempting to use an 8-bit quantization method (often through `transformers` integration) which relies on the `bitsandbytes` library, but it is not installed or not correctly linked to your CUDA setup.","error":"ModuleNotFoundError: No module named 'bitsandbytes'"},{"fix":"Ensure your `dataloader` yields data in the format expected by the model's `forward` method and the chosen quantizer. For many LLM examples, this means providing a tuple or dictionary with `input_ids` (e.g., `(input_ids,)` for a simple tuple, or `{'input_ids': input_ids}` for a dict).","cause":"This error typically occurs when a `dataloader` provided to a quantizer (like `GPTQQuantizer`) does not yield data in the format expected by the model or the quantizer, particularly for language models expecting specific keys like `input_ids`.","error":"AttributeError: 'tuple' object has no attribute 'input_ids'"},{"fix":"Carefully check the required PyTorch and CUDA versions for `pytorch_quantization` and `diffq`. Often, reinstalling PyTorch (e.g., `pip install torch==X.Y.Z+cuXXX -f https://download.pytorch.org/whl/torch_stable.html`) with the exact matching CUDA version is necessary.","cause":"This is a low-level CUDA error, often indicating an incompatibility between your installed PyTorch version, CUDA toolkit, GPU driver, or the specific `pytorch_quantization` version being used. Quantization operations are highly sensitive to the entire CUDA software stack.","error":"RuntimeError: cuDNN error: CUDNN_STATUS_BAD_PARAM"}]}