ONNX Runtime
ONNX Runtime is a cross-platform, high-performance machine learning inference and training accelerator. It enables faster customer experiences and lower costs by supporting models from various deep learning frameworks (e.g., PyTorch, TensorFlow/Keras) and classical ML libraries (e.g., scikit-learn). The library is actively maintained with new releases approximately quarterly, including patch releases, and commits to backwards compatibility.
Warnings
- breaking The `onnxruntime` and `onnxruntime-gpu` packages are mutually exclusive. Only one should be installed in a given Python environment. Installing both can lead to unexpected behavior or errors.
- breaking Since ONNX Runtime 1.10, execution providers (like CUDAExecutionProvider for GPU) must be explicitly specified when creating an `InferenceSession`. If not specified, it defaults to `CPUExecutionProvider` only. Older code that relied on implicit GPU usage will break or silently fall back to CPU.
- gotcha Input data shapes and data types must precisely match the ONNX model's expected inputs, otherwise `onnxruntime` will raise `INVALID_ARGUMENT` errors. Common mistakes include incorrect dimensions or using `float64` instead of the expected `float32`.
- gotcha When using `onnxruntime-gpu`, correct installation of the CUDA Toolkit and cuDNN libraries matching your `onnxruntime-gpu` version is crucial. Mismatched versions are a frequent cause of 'DLL not found' or 'Failed to create session' errors.
- deprecated The `generate()` API for generative AI models underwent breaking changes from ONNX Runtime GenAI 0.5.2 to 0.6.0, notably replacing `params.input_ids = input_tokens` with `generator.append_tokens(input_tokens)` and removing `generator.compute_logits()`.
- gotcha For pre- and post-processing steps using custom ONNX operators, the `onnxruntime-extensions` package must be separately installed and its custom operators registered with the `InferenceSession` via `session_options.register_custom_ops_library()`.
Install
-
pip install onnxruntime -
pip install onnxruntime-gpu -
pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-11/pypi/simple/
Imports
- InferenceSession
from onnxruntime import InferenceSession
- SessionOptions
from onnxruntime import SessionOptions
- get_available_providers
from onnxruntime import get_available_providers
- PyOrtFunction
from onnxruntime_extensions import PyOrtFunction
Quickstart
import onnxruntime as ort
import numpy as np
import os
# NOTE: This example assumes you have an ONNX model file named 'model.onnx'.
# You can typically export models from frameworks like PyTorch or TensorFlow to ONNX format.
# For a runnable example, you'd need to create a dummy model:
# e.g., using ONNX library:
# import onnx
# from onnx import TensorProto
# from onnx.helper import make_model, make_node, make_graph, make_tensor_value_info
# X = make_tensor_value_info('input', TensorProto.FLOAT, [None, 2])
# Y = make_tensor_value_info('output', TensorProto.FLOAT, [None, 2])
# node = make_node('Add', ['input', 'input'], ['output'])
# graph = make_graph([node], 'simple-graph', [X], [Y])
# onnx_model = make_model(graph)
# onnx.save(onnx_model, 'model.onnx')
model_path = os.environ.get('ONNX_MODEL_PATH', 'model.onnx')
# 1. Create an InferenceSession
# For GPU, add providers=['CUDAExecutionProvider', 'CPUExecutionProvider']
session = ort.InferenceSession(model_path, providers=ort.get_available_providers())
print("Model inputs:")
for input_meta in session.get_inputs():
print(f" Name: {input_meta.name}, Shape: {input_meta.shape}, Type: {input_meta.type}")
print("\nModel outputs:")
for output_meta in session.get_outputs():
print(f" Name: {output_meta.name}, Shape: {output_meta.shape}, Type: {output_meta.type}")
# 2. Prepare input data (example for a model expecting a float32 array)
# Assuming the first input expects a 2D float32 array, e.g., shape (1, 2)
input_name = session.get_inputs()[0].name
input_shape = [dim if isinstance(dim, int) else 1 for dim in session.get_inputs()[0].shape] # Handle dynamic shapes
input_data = np.random.randn(*input_shape).astype(np.float32)
# 3. Run inference
outputs = session.run(None, {input_name: input_data})
# 4. Process outputs
print(f"\nOutput data type: {outputs[0].dtype}")
print(f"Output shape: {outputs[0].shape}")
print(f"Output data (first 5 elements): {outputs[0].flatten()[:5]}")