{"id":6402,"library":"nncf","title":"Neural Networks Compression Framework","description":"The Neural Networks Compression Framework (NNCF) is a Python library developed by Intel as part of the OpenVINO Toolkit, providing advanced algorithms for optimizing deep learning models for faster and smaller inference. It supports models from PyTorch, TensorFlow (deprecated), ONNX, and OpenVINO IR formats, offering techniques such as Post-Training Quantization, Quantization-Aware Training, Weight Compression, and Pruning. NNCF is actively maintained with frequent releases, with the current stable version being 3.1.0.","status":"active","version":"3.1.0","language":"en","source_language":"en","source_url":"https://github.com/openvinotoolkit/nncf","tags":["AI/ML","model compression","quantization","pruning","OpenVINO","PyTorch","TensorFlow","ONNX"],"install":[{"cmd":"pip install nncf","lang":"bash","label":"Basic Install"},{"cmd":"pip install nncf[openvino]","lang":"bash","label":"For OpenVINO Backend"},{"cmd":"pip install nncf[torch]","lang":"bash","label":"For PyTorch Backend"},{"cmd":"pip install nncf[tensorflow]","lang":"bash","label":"For TensorFlow Backend (Deprecated)"}],"dependencies":[{"reason":"Required Python version.","package":"python","version":">=3.10"},{"reason":"Required for OpenVINO backend functionality. Installed with `nncf[openvino]`.","package":"openvino","optional":true},{"reason":"Required for PyTorch backend functionality. Installed with `nncf[torch]`.","package":"torch","optional":true},{"reason":"Required for TensorFlow backend functionality. This backend is deprecated.","package":"tensorflow","optional":true},{"reason":"Required for ONNX model processing.","package":"onnx","optional":true}],"imports":[{"note":"The primary API for Post-Training Quantization across supported frameworks.","symbol":"quantize","correct":"import nncf\nquantized_model = nncf.quantize(model, calibration_dataset)"},{"note":"Used for data-free or data-aware weight compression, especially for LLMs.","symbol":"compress_weights","correct":"import nncf\ncompressed_model = nncf.compress_weights(model)"},{"note":"Unified API for pruning algorithms, currently for PyTorch.","symbol":"prune","correct":"import nncf\npruned_model = nncf.prune(model, config)"},{"symbol":"NNCFConfig","correct":"from nncf import NNCFConfig"},{"note":"Used in `nncf.quantize` to specify model architecture for better optimization.","symbol":"ModelType","correct":"from nncf import ModelType"},{"note":"Defines quantization modes (e.g., symmetric/asymmetric) for `nncf.quantize`.","symbol":"QuantizationPreset","correct":"from nncf import QuantizationPreset"},{"symbol":"AdvancedQuantizationParameters","correct":"from nncf.quantization.advanced_parameters import AdvancedQuantizationParameters"},{"symbol":"IgnoredScope","correct":"from nncf import IgnoredScope"},{"note":"For saving PyTorch model compression configurations.","symbol":"get_config","correct":"from nncf.torch import get_config"},{"note":"For loading PyTorch model compression configurations.","symbol":"load_from_config","correct":"from nncf.torch import load_from_config"}],"quickstart":{"code":"import nncf\nimport openvino as ov\nimport torch\nfrom torchvision import datasets, transforms, models\nimport os\n\n# 1. Load a pre-trained PyTorch model\nmodel = models.resnet18(weights=models.ResNet18_Weights.IMAGENET1K_V1)\nmodel.eval()\n\n# 2. Convert PyTorch model to OpenVINO Model\n# Create a dummy input for tracing\ndummy_input = torch.randn(1, 3, 224, 224)\nov_model = ov.convert_model(model, example_input=dummy_input)\n\n# 3. Prepare a calibration dataset (example with random data)\n# In a real scenario, use representative data from your dataset\nclass RandomDataset(torch.utils.data.Dataset):\n    def __init__(self, size=300):\n        self.size = size\n    def __len__(self):\n        return self.size\n    def __getitem__(self, idx):\n        return torch.randn(3, 224, 224), 0 # dummy label\n\ncalibration_dataset = RandomDataset()\n\n# 4. Define a transformation function for the calibration dataset\ndef transform_fn(data_item):\n    return data_item[0].numpy() # NNCF expects NumPy array for OpenVINO PTQ\n\n# 5. Apply Post-Training Quantization (PTQ)\nprint(\"Applying Post-Training Quantization...\")\nquantized_ov_model = nncf.quantize(\n    ov_model,\n    nncf.Dataset(calibration_dataset, transform_fn)\n)\n\n# 6. Save the quantized OpenVINO model\noutput_dir = \"./quantized_model\"\nos.makedirs(output_dir, exist_ok=True)\nmodel_path = os.path.join(output_dir, \"resnet18_quantized.xml\")\nov.save_model(quantized_ov_model, model_path)\nprint(f\"Quantized model saved to {model_path}\")\n\n# To load and use the quantized model:\n# core = ov.Core()\n# loaded_model = core.read_model(model_path)\n# compiled_model = core.compile_model(loaded_model, \"CPU\")\n# # Inference goes here\n# print(\"Model loaded and compiled for inference.\")","lang":"python","description":"This quickstart demonstrates how to perform 8-bit Post-Training Quantization (PTQ) on a pre-trained PyTorch model and convert it to an OpenVINO Intermediate Representation (IR) format using NNCF. It involves loading a model, creating a dummy calibration dataset, defining a transformation, and then applying `nncf.quantize`."},"warnings":[{"fix":"Review any code that directly manipulates `NNCFGraph` objects and adapt it for `nx.MultiDiGraph` semantics.","message":"NNCFGraph, a core internal representation, was migrated from `nx.DiGraph` to `nx.MultiDiGraph` in v3.1.0 to support models with parallel/multi-edges. This can break code that directly interacts with NNCF's internal graph structure.","severity":"breaking","affected_versions":">=3.1.0"},{"fix":"Update references to `nncf.CompressWeightsMode.CB4_F8E4M3` to `nncf.CompressWeightsMode.CB4`.","message":"The `nncf.CompressWeightsMode.CB4_F8E4M3` mode option was renamed to `nncf.CompressWeightsMode.CB4`.","severity":"breaking","affected_versions":">=3.0.0"},{"fix":"Update references to `nncf.CompressWeightsMode.E2M1` to `nncf.CompressWeightsMode.MXFP4`.","message":"The `nncf.CompressWeightsMode.E2M1` mode option was renamed to `nncf.CompressWeightsMode.MXFP4`.","severity":"breaking","affected_versions":">=2.19.0"},{"fix":"Migrate TensorFlow-based NNCF workflows to PyTorch, OpenVINO IR, or ONNX backends.","message":"The TensorFlow backend is deprecated and will be removed in future releases. It is recommended to use PyTorch models for training-aware optimization and OpenVINO IR, PyTorch, or ONNX for post-training methods.","severity":"deprecated","affected_versions":"Introduced in 2.19.0, ongoing"},{"fix":"Consult NNCF documentation for alternative or officially supported compression methods.","message":"Several experimental NNCF methods including NAS, Structural Pruning, AutoML, Knowledge Distillation, Mixed-Precision Quantization, and Movement Sparsity are deprecated and will be removed in future releases.","severity":"deprecated","affected_versions":"Introduced in 2.19.0, ongoing"},{"fix":"Ensure Dropout layers are disabled in your model's training pipeline when applying NNCF QAT.","message":"When using Quantization-Aware Training with NNCF, it is generally recommended to turn off Dropout layers (and similar layers like DropConnect) during training to prevent accuracy degradation.","severity":"gotcha","affected_versions":"All"},{"fix":"Reduce batch size for NNCF training runs or ensure CUDA development tools (e.g., `nvcc` compiler) are installed and accessible in your environment variables.","message":"Users may encounter 'CUDA out of memory' errors during compression-aware training due to the increased GPU memory footprint of NNCF-compressed models. Additionally, `gcc`, `nvcc`, `ninja`, or `cl.exe` errors can occur if CUDA development tools are not properly installed or configured in the PATH/PYTHONPATH for PyTorch.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z"}