Model Compression Toolkit

raw JSON →
2.6.0 verified Fri May 01 auth: no python

A Model Compression Toolkit (MCT) for neural networks, supporting quantization, pruning, and knowledge distillation. Current version is 2.6.0. Released monthly on PyPI.

pip install model-compression-toolkit
error AttributeError: module 'model_compression_toolkit' has no attribute 'pytorch_post_training_quantization'
cause Installed old version (<2.0) or incorrect import due to missing dependencies (e.g., torch not installed).
fix
Upgrade: pip install --upgrade model-compression-toolkit. Ensure PyTorch is installed: pip install torch.
error KeyError: 'target_platform_name'
cause Using old argument name; in v2.6 the correct argument is `target_platform` (but 'target_platform_name' is still accepted as alias).
fix
Use target_platform='default' or target_platform_name='default' (both work, but preferred is target_platform).
error RuntimeError: Quantization failed: representative_data_gen must be a callable that returns an iterable.
cause Providing a list of tensors directly instead of a generator function.
fix
Wrap data in a function: def rep_data(): yield input_tensor and pass the function, not the data list.
breaking In version 2.0+, the API changed from using `post_training_quantization` to `pytorch_post_training_quantization`. Old code will break.
fix Update import to `from model_compression_toolkit import pytorch_post_training_quantization` and adjust function calls accordingly.
gotcha The `model_compression_toolkit` package name uses underscores, but many users mistakenly import `mct`. The package does not provide a `mct` top-level module.
fix Use correct import: `from model_compression_toolkit import ...`
deprecated The `keras`-related interfaces (e.g., `keras_post_training_quantization`) are deprecated as of v2.6 and may be removed in future versions.
fix Transition to the PyTorch or TensorFlow 2.x native API. Use `tensorflow_post_training_quantization` for TF2.

Basic post-training quantization on a PyTorch model using representative data.

import torch
from model_compression_toolkit import pytorch_post_training_quantization as ptq

tmodel = torch.nn.Linear(10, 5)
tmodel.eval()
representative_dataset = [torch.randn(1, 10) for _ in range(5)]
quantized_model, quantization_info = ptq.pytorch_post_training_quantization(
    model=tmodel,
    representative_data_gen=representative_dataset,
    target_platform_name='default'
)
print('Quantization completed, model size saved.')