OpenVINO GenAI

raw JSON →
2026.1.0.0 verified Fri May 01 auth: no python

OpenVINO GenAI provides optimized pipelines and execution methods for generative AI models (LLMs, image generation, etc.) leveraging Intel OpenVINO. Current version 2026.1.0.0, with quarterly releases. It enables efficient inference on Intel hardware.

pip install openvino-genai
error ImportError: cannot import name 'LLMPipeline' from 'openvino_genai'
cause Incorrect import path or missing installation (openvino-genai not installed).
fix
Run pip install openvino-genai and use correct import: from openvino_genai import LLMPipeline.
error RuntimeError: Model not in OpenVINO format
cause Trying to load a Hugging Face model directly without converting to OpenVINO IR.
fix
Convert using optimum-cli export openvino --model <model-id> <output-dir> before loading.
error TypeError: generate() missing required positional argument: 'max_new_tokens'
cause Using API version >=2025.1.0 where max_new_tokens became required.
fix
Pass max_new_tokens argument to generate method.
breaking In 2025.1.0, the `generate` method signature changed: `max_new_tokens` is now required (no longer defaults to 256). Code using older versions may fail with TypeError.
fix Always pass `max_new_tokens` explicitly, e.g., `pipeline.generate(prompt, max_new_tokens=100)`.
breaking In 2026.0.0, the `import openvino_genai.pipeline` was removed. You must import directly from openvino_genai.
fix Use `from openvino_genai import LLMPipeline` instead of `from openvino_genai.pipeline import LLMPipeline`.
gotcha Model must be in OpenVINO IR format (`.xml` + `.bin`). Loading a Hugging Face model directly without conversion throws `RuntimeError: Model not in OpenVINO format`.
fix Use `optimum-cli export openvino --model <hf-id> <output-dir>` to convert first.
deprecated `openvino_genai.StableDiffusionPipeline` is deprecated in favor of `ImageGenerationPipeline` as of 2025.1.0.
fix Replace `StableDiffusionPipeline` with `ImageGenerationPipeline`.
gotcha When using GPU (OpenVINO with GPU plugin), ensure that OpenCL runtime is installed and compatible with your Intel GPU.
fix Install `openvino-2025.0.0` or later with GPU plugin; check `git clone` for GPU driver install scripts.

Basic LLM inference using OpenVINO GenAI. Ensure model is converted to OpenVINO IR format.

import openvino_genai
from openvino_genai import LLMPipeline

# Create pipeline with a model directory (assuming model files are present)
pipeline = LLMPipeline("./tiny-llama")

# Generate a response
result = pipeline.generate("What is the capital of France?", max_new_tokens=100)
print(result)