Optimum Intel
Optimum Intel extends the Hugging Face Transformers and Diffusers libraries, providing a framework to integrate Intel's specialized tools and libraries like OpenVINO, Neural Compressor, and Intel Extension for PyTorch. It enables optimization, conversion (e.g., to OpenVINO IR format), and accelerated inference of deep learning models on Intel architectures. The library is actively maintained with frequent minor version releases, currently at 1.27.0.
Common errors
-
ModuleNotFoundError: No module named 'optimum.intel.lpot'
cause The `lpot` subpackage was renamed to `neural_compressor`, and its import paths changed.fixUpdate your import statements. For example, change `from optimum.intel.lpot.quantization import LpotQuantizerForSequenceClassification` to `from optimum.intel import INCModelForSequenceClassification` (or similar `INCModelForXxx` class). -
RuntimeError: [ ERROR ] Failed to compile the model.
cause This often occurs during `from_pretrained(export=True)` or `model.compile_model()` if there are issues with the OpenVINO environment, device compatibility, or model specifics (e.g., unsupported operations on the target device, or missing OpenVINO development tools).fixEnsure OpenVINO is correctly installed and its dependencies are met. Check the full traceback for more specific OpenVINO errors. Try specifying `device="CPU"` or `device="GPU"` explicitly. Save the model after conversion using `model.save_pretrained()` and then load it without `export=True` for debugging. Refer to OpenVINO documentation for device-specific requirements. -
Segmentation fault (core dumped) during inference with OpenVINO.
cause This issue has been observed with concurrent inference calls using OpenVINO, suggesting a potential race condition or resource management problem under heavy load or parallel execution.fixInvestigate concurrency patterns and potential race conditions in your application. Ensure proper resource synchronization or consider processing inferences sequentially if parallel calls are causing instability. Monitor memory usage.
Warnings
- deprecated The installation extras for specific backends (e.g., `[openvino]`, `[nncf]`, `[neural-compressor]`, `[ipex]`) via `pip install optimum-intel[...]` are deprecated and will be removed in a future release. Users are encouraged to install `optimum` and its specific backend extras directly or install `optimum-intel` base and then the backend libraries separately.
- breaking The `nf4_fp8` quantization modes have been removed. Code relying on these specific quantization modes will break.
- gotcha When using OpenVINO Runtime with PyTorch for post-processing (e.g., beam search), OpenVINO's default threading (oneTBB) can interact poorly with PyTorch's OpenMP, leading to performance degradation or delays.
Install
-
pip install --upgrade-strategy eager "optimum-intel[openvino]" -
pip install optimum-intel
Imports
- OVModelForCausalLM
from transformers import AutoModelForCausalLM
from optimum.intel import OVModelForCausalLM
- OVModelForSeq2SeqLM
from transformers import AutoModelForSeq2SeqLM
from optimum.intel import OVModelForSeq2SeqLM
- OVStableDiffusionPipeline
from diffusers import StableDiffusionPipeline
from optimum.intel import OVStableDiffusionPipeline
- INCModelForSequenceClassification
from optimum.intel.lpot.quantization import LpotQuantizerForSequenceClassification
from optimum.intel import INCModelForSequenceClassification
Quickstart
from transformers import AutoTokenizer, pipeline
from optimum.intel import OVModelForSequenceClassification
model_id = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Load and convert the model to OpenVINO IR format on the fly
model = OVModelForSequenceClassification.from_pretrained(model_id, export=True)
# Run inference
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
results = classifier("Optimum Intel is great!")
print(results)