OpenVINO Runtime
OpenVINO™ Runtime is an open-source toolkit for optimizing and deploying AI inference. It enables developers to deploy pre-trained deep learning models through a unified API on a variety of Intel hardware (CPUs, GPUs, NPUs, VPUs, etc.). The current version is 2026.1.0, with major releases typically following a quarterly or semi-annual cadence, aligning with Intel product releases.
Warnings
- breaking Major API overhaul: The `openvino.inference_engine` module and its classes (e.g., `IECore`, `IENetwork`, `IEPlugin`, `Tensor`) were deprecated and subsequently removed in favor of `openvino.runtime` in OpenVINO 2022.x.
- gotcha Device availability and selection: OpenVINO often defaults to 'CPU' if no device is specified or if the specified device is unavailable. Users might expect automatic GPU or NPU usage without explicit configuration.
- breaking Intermediate Representation (IR) version changes: OpenVINO's internal model format (IR) has evolved (e.g., from IR v7 to IR v10). While `core.read_model` typically handles conversion for recent versions, very old `.xml` / `.bin` models might not load or behave as expected.
- breaking Python 3.9 and older are no longer supported. The `openvino` package now requires Python 3.10 or newer.
Install
-
pip install openvino
Imports
- Core
from openvino.runtime import Core
- Tensor
from openvino.runtime import Tensor
- Model
from openvino.runtime import Model
- Type
from openvino.runtime import Type
- opset12
import openvino.runtime.opset12 as ov_ops
Quickstart
import openvino.runtime as ov
import numpy as np
import os
# 1. Create a Core object to manage devices and models
core = ov.Core()
# 2. Create a dummy model for demonstration
# In a real scenario, you would load from .xml/.bin using:
# model = core.read_model("path/to/model.xml")
input_shape = [1, 3, 224, 224] # Batch, Channels, Height, Width
output_shape = [1, 1000] # Batch, Class_count
input_node = ov.opset12.parameter(input_shape, ov.Type.f32, name="input")
output_node = ov.opset12.result(input_node) # Simple identity model
model = ov.Model([output_node], [input_node], "dummy_model")
# 3. Compile the model for a specific device
# Use os.environ.get for dynamic device selection in production
device = os.environ.get("OPENVINO_DEVICE", "CPU") # Example: "GPU", "NPU"
print(f"Compiling model for device: {device}")
compiled_model = core.compile_model(model, device)
# 4. Create an inference request
infer_request = compiled_model.create_infer_request()
# 5. Prepare input data (random data for dummy model)
input_data = np.random.rand(*input_shape).astype(np.float32)
infer_request.set_input_tensor(ov.Tensor(input_data))
# 6. Perform inference
infer_request.infer()
# 7. Get output data
output_tensor = infer_request.get_output_tensor()
output_data = output_tensor.data
print(f"Inference successful. Output shape: {output_data.shape}")
print(f"First 5 output values: {output_data.flatten()[:5]}")