{"id":1968,"library":"compressed-tensors","title":"Compressed Tensors","description":"Compressed Tensors is a Python library designed for the efficient utilization and storage of compressed safetensors of neural network models. It provides tools for quantization, compression, and handling various compression schemes. The current version is 0.15.0, and the project maintains an active release cadence, frequently pushing minor updates and bug fixes.","status":"active","version":"0.15.0","language":"en","source_language":"en","source_url":"https://github.com/vllm-project/compressed-tensors","tags":["AI","ML","compression","quantization","neural networks","safetensors","vllm"],"install":[{"cmd":"pip install compressed-tensors","lang":"bash","label":"Basic installation"},{"cmd":"pip install 'compressed-tensors[accelerate]'","lang":"bash","label":"With Accelerate (for offloading/distributed)"}],"dependencies":[{"reason":"Required for certain features like offloading and distributed model handling.","package":"accelerate","optional":true}],"imports":[{"symbol":"CompressionConfig","correct":"from compressed_tensors.config import CompressionConfig"},{"symbol":"dispatch_model","correct":"from compressed_tensors.dispatch import dispatch_model"},{"symbol":"QuantizationScheme","correct":"from compressed_tensors.quantization import QuantizationScheme"},{"symbol":"SparseGPT","correct":"from compressed_tensors.compressors import SparseGPT"}],"quickstart":{"code":"import torch\nfrom transformers import AutoModelForCausalLM\nfrom compressed_tensors.config import CompressionConfig\nfrom compressed_tensors.dispatch import dispatch_model\n\n# 1. Define a simple model for demonstration\nclass DummyModel(torch.nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.linear1 = torch.nn.Linear(10, 20)\n        self.linear2 = torch.nn.Linear(20, 10)\n    def forward(self, x):\n        return self.linear2(self.linear1(x))\n\nmodel = DummyModel()\n\n# For a real model, you'd load it like this (example using AutoModelForCausalLM):\n# model_id = \"TinyLlama/TinyLlama-1.1B-Chat-v1.0\"\n# model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16)\n\n# 2. Create a CompressionConfig\ncompression_config = CompressionConfig(\n    quantization_scheme=None, # No quantization for this example\n    compressed_tensors_path=\"./compressed_model\"\n)\n\n# 3. Dispatch the model (apply compression/quantization)\n# For a simple compression, you might just save it.\n# If using actual compression schemes, dispatch_model applies them.\n\n# For a basic example, we will just demonstrate loading and saving\n# In a real scenario, you'd define quantization_scheme and other parameters\n# within CompressionConfig to actually compress the tensors.\n\nprint(f\"Original model type: {type(model)}\")\n\n# Example of how dispatch_model would typically be used:\n# from compressed_tensors.quantization import QuantizationScheme\n# compression_config_quantized = CompressionConfig(\n#     quantization_scheme=QuantizationScheme(num_bits=8, quant_method=\"per_tensor\")\n# )\n# compressed_model = dispatch_model(model, compression_config_quantized)\n\n# For this basic example, we'll just show an identity operation\n# or a basic save if compression_config had a path\n# As 'dispatch_model' is typically used for actual compression/quantization\n# let's simulate saving for demonstration without complex compression logic\n\n# A more direct compression example usually involves a compressor:\n# from compressed_tensors.compressors import SparseGPT\n# compressor = SparseGPT()\n# compressed_model_state_dict = compressor.compress(model.state_dict(), compression_config)\n# print(f\"Compressed model state dict keys: {compressed_model_state_dict.keys()}\")\n\n# Simplified output demonstration:\nprint(\"Model preparation complete.\")\nprint(\"To apply actual compression, define 'quantization_scheme' in CompressionConfig.\")\nprint(f\"Compression config path: {compression_config.compressed_tensors_path}\")","lang":"python","description":"This quickstart demonstrates how to set up a `CompressionConfig` and highlights the typical flow for using `compressed-tensors` with a model. While `dispatch_model` is the entry point for applying compression, this example provides a simplified overview. For actual compression, define a `quantization_scheme` within `CompressionConfig`."},"warnings":[{"fix":"Remove usages of `safe_permute` and implement permutation logic directly or use alternative utilities if available.","message":"The `safe_permute` utility function was removed in version 0.12.2. Any code relying on this specific utility will break.","severity":"breaking","affected_versions":"<=0.12.1"},{"fix":"Install `compressed-tensors` with the `accelerate` extra: `pip install 'compressed-tensors[accelerate]'`.","message":"The `accelerate` library is an optional dependency. Features requiring `accelerate` (e.g., specific offloading or distributed capabilities) will raise a `ModuleNotFoundError` if `accelerate` is not installed.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure you are referencing the correct GitHub repository for issues, contributions, or the latest documentation.","message":"Between versions 0.12.2 and 0.14.0, the project repository moved from `neuralmagic/compressed-tensors` to `vllm-project/compressed-tensors`. While import paths are generally stable, users should be aware of this change in project ownership for community, support, and future development tracking.","severity":"gotcha","affected_versions":"All versions after repository migration (~0.14.0+)"},{"fix":"Upgrade to version 0.14.0.1 or newer to ensure correct file writing behavior, especially when saving compressed models or associated metadata.","message":"Version 0.14.0.1 included a patch to fix 'bugs related to file writing'. Prior versions might have had issues with the integrity or correctness of saved compressed models or related artifacts.","severity":"gotcha","affected_versions":"<=0.14.0"},{"fix":"Carefully review compression behavior if upgrading from versions around 0.12.0; consider thorough testing of compression results.","message":"Version 0.12.0 introduced and then quickly reverted/re-applied a 'Refactor module / parameter matching logic'. This could indicate instability or subtle behavioral changes in how compression strategies target specific model layers/parameters for users upgrading through these versions.","severity":"gotcha","affected_versions":"0.12.0"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}