OpenVINO Runtime

2026.1.0 · active · verified Sat Apr 11

OpenVINO™ Runtime is an open-source toolkit for optimizing and deploying AI inference. It enables developers to deploy pre-trained deep learning models through a unified API on a variety of Intel hardware (CPUs, GPUs, NPUs, VPUs, etc.). The current version is 2026.1.0, with major releases typically following a quarterly or semi-annual cadence, aligning with Intel product releases.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize the OpenVINO Runtime, create a simple dummy model programmatically (or load a real one), compile it for a specific device (defaults to CPU), run inference with random input data, and retrieve the output. Remember to replace the dummy model creation with actual model loading in a real application.

import openvino.runtime as ov
import numpy as np
import os

# 1. Create a Core object to manage devices and models
core = ov.Core()

# 2. Create a dummy model for demonstration
# In a real scenario, you would load from .xml/.bin using: 
# model = core.read_model("path/to/model.xml")
input_shape = [1, 3, 224, 224] # Batch, Channels, Height, Width
output_shape = [1, 1000] # Batch, Class_count

input_node = ov.opset12.parameter(input_shape, ov.Type.f32, name="input")
output_node = ov.opset12.result(input_node) # Simple identity model
model = ov.Model([output_node], [input_node], "dummy_model")

# 3. Compile the model for a specific device
# Use os.environ.get for dynamic device selection in production
device = os.environ.get("OPENVINO_DEVICE", "CPU") # Example: "GPU", "NPU"
print(f"Compiling model for device: {device}")
compiled_model = core.compile_model(model, device)

# 4. Create an inference request
infer_request = compiled_model.create_infer_request()

# 5. Prepare input data (random data for dummy model)
input_data = np.random.rand(*input_shape).astype(np.float32)
infer_request.set_input_tensor(ov.Tensor(input_data))

# 6. Perform inference
infer_request.infer()

# 7. Get output data
output_tensor = infer_request.get_output_tensor()
output_data = output_tensor.data

print(f"Inference successful. Output shape: {output_data.shape}")
print(f"First 5 output values: {output_data.flatten()[:5]}")

view raw JSON →