MCT Quantizers

raw JSON →
1.7.0 verified Sat May 09 auth: no python

MCT Quantizers is a Python library that provides infrastructure for supporting neural network compression through quantization-aware training and post-training quantization. It is part of the Model Compression Toolkit (MCT) ecosystem. Version 1.7.0 requires Python >=3.10. The library is actively maintained with regular releases.

pip install mct-quantizers
error ModuleNotFoundError: No module named 'mct_quantizers'
cause The package is not installed or imported with wrong name.
fix
Install with 'pip install mct-quantizers' and import as 'mct_quantizers' (underscore).
error AttributeError: module 'mct_quantizers' has no attribute 'QuantizationConfig'
cause Trying to import QuantizationConfig directly from top-level while it was moved from a submodule in version 1.5.0.
fix
Use 'from mct_quantizers import QuantizationConfig' (for versions >=1.5.0) or 'from mct_quantizers.quantization import QuantizationConfig' (for older versions).
error RuntimeError: Quantization not supported for operation type ...
cause The model contains an unsupported layer/operation for quantization.
fix
Check the list of supported layers in the documentation. Consider replacing unsupported layers with supported alternatives or skip quantization for those layers.
breaking In version 1.5.0, the import path for QuantizationConfig changed from mct_quantizers.quantization to mct_quantizers. Old code will break.
fix Update imports: replace 'from mct_quantizers.quantization import QuantizationConfig' with 'from mct_quantizers import QuantizationConfig'.
deprecated PostTrainingQuantization for TensorFlow models is deprecated since version 1.6.0 and will be removed in future releases. Use PyTorch or ONNX variants instead.
fix If using TensorFlow, migrate your pipeline to PyTorch or use mct-quantizers with ONNX backend.
gotcha When using multi-GPU, the quantization process may fail if the model is not properly wrapped with DataParallel. This is not automatically handled.
fix Wrap your model with torch.nn.DataParallel before passing to PostTrainingQuantization.

Basic usage: quantize a PyTorch model with post-training quantization.

import torch
from mct_quantizers.pytorch import QuantizationConfig, PostTrainingQuantization

# Create a simple model
model = torch.nn.Sequential(torch.nn.Linear(10, 5), torch.nn.ReLU())

# Define quantization configuration
config = QuantizationConfig(n_bits=8, per_channel=True)

# Apply post-training quantization
ptq = PostTrainingQuantization(model, config)
quantized_model = ptq.quantize()

# Save the quantized model
torch.save(quantized_model.state_dict(), 'quantized_model.pth')