Neural Networks Compression Framework

3.1.0 · active · verified Wed Apr 15

The Neural Networks Compression Framework (NNCF) is a Python library developed by Intel as part of the OpenVINO Toolkit, providing advanced algorithms for optimizing deep learning models for faster and smaller inference. It supports models from PyTorch, TensorFlow (deprecated), ONNX, and OpenVINO IR formats, offering techniques such as Post-Training Quantization, Quantization-Aware Training, Weight Compression, and Pruning. NNCF is actively maintained with frequent releases, with the current stable version being 3.1.0.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to perform 8-bit Post-Training Quantization (PTQ) on a pre-trained PyTorch model and convert it to an OpenVINO Intermediate Representation (IR) format using NNCF. It involves loading a model, creating a dummy calibration dataset, defining a transformation, and then applying `nncf.quantize`.

import nncf
import openvino as ov
import torch
from torchvision import datasets, transforms, models
import os

# 1. Load a pre-trained PyTorch model
model = models.resnet18(weights=models.ResNet18_Weights.IMAGENET1K_V1)
model.eval()

# 2. Convert PyTorch model to OpenVINO Model
# Create a dummy input for tracing
dummy_input = torch.randn(1, 3, 224, 224)
ov_model = ov.convert_model(model, example_input=dummy_input)

# 3. Prepare a calibration dataset (example with random data)
# In a real scenario, use representative data from your dataset
class RandomDataset(torch.utils.data.Dataset):
    def __init__(self, size=300):
        self.size = size
    def __len__(self):
        return self.size
    def __getitem__(self, idx):
        return torch.randn(3, 224, 224), 0 # dummy label

calibration_dataset = RandomDataset()

# 4. Define a transformation function for the calibration dataset
def transform_fn(data_item):
    return data_item[0].numpy() # NNCF expects NumPy array for OpenVINO PTQ

# 5. Apply Post-Training Quantization (PTQ)
print("Applying Post-Training Quantization...")
quantized_ov_model = nncf.quantize(
    ov_model,
    nncf.Dataset(calibration_dataset, transform_fn)
)

# 6. Save the quantized OpenVINO model
output_dir = "./quantized_model"
os.makedirs(output_dir, exist_ok=True)
model_path = os.path.join(output_dir, "resnet18_quantized.xml")
ov.save_model(quantized_ov_model, model_path)
print(f"Quantized model saved to {model_path}")

# To load and use the quantized model:
# core = ov.Core()
# loaded_model = core.read_model(model_path)
# compiled_model = core.compile_model(loaded_model, "CPU")
# # Inference goes here
# print("Model loaded and compiled for inference.")

view raw JSON →