ONNX Runtime
ONNX Runtime is a cross-platform, high-performance machine learning inference and training accelerator. It enables faster customer experiences and lower costs by supporting models from various deep learning frameworks (e.g., PyTorch, TensorFlow/Keras) and classical ML libraries (e.g., scikit-learn). The library is actively maintained with new releases approximately quarterly, including patch releases, and commits to backwards compatibility.
Common errors
-
ModuleNotFoundError: No module named 'onnxruntime'
cause The 'onnxruntime' module is not installed in the Python environment.fixInstall the module using pip: 'pip install onnxruntime'. -
AttributeError: module 'onnxruntime' has no attribute 'InferenceSession'
cause The 'onnxruntime' module is either not installed correctly or the script's filename conflicts with the module name.fixEnsure 'onnxruntime' is installed correctly and rename any script named 'onnxruntime.py' to avoid conflicts. -
AttributeError: module 'onnxruntime' has no attribute 'OrtValue'
cause The 'OrtValue' attribute is not available in the installed version of 'onnxruntime'.fixUpgrade 'onnxruntime' to the latest version using pip: 'pip install --upgrade onnxruntime'. -
AttributeError: module 'onnxruntime' has no attribute 'SessionOptions'
cause The 'SessionOptions' attribute is not available in the installed version of 'onnxruntime'.fixUpgrade 'onnxruntime' to the latest version using pip: 'pip install --upgrade onnxruntime'. -
[ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type. Actual: (tensor(double)) , expected: (tensor(float)).
cause The input data provided to the ONNX Runtime session has a data type (e.g., `numpy.float64`) that does not match the expected data type of the ONNX model (e.g., `numpy.float32`).fixConvert your input data to the expected data type, typically `numpy.float32`, before feeding it to the ONNX Runtime session, e.g., `input_data.astype(numpy.float32)`.
Warnings
- breaking The `onnxruntime` and `onnxruntime-gpu` packages are mutually exclusive. Only one should be installed in a given Python environment. Installing both can lead to unexpected behavior or errors.
- breaking Since ONNX Runtime 1.10, execution providers (like CUDAExecutionProvider for GPU) must be explicitly specified when creating an `InferenceSession`. If not specified, it defaults to `CPUExecutionProvider` only. Older code that relied on implicit GPU usage will break or silently fall back to CPU.
- gotcha Input data shapes and data types must precisely match the ONNX model's expected inputs, otherwise `onnxruntime` will raise `INVALID_ARGUMENT` errors. Common mistakes include incorrect dimensions or using `float64` instead of the expected `float32`.
- gotcha When using `onnxruntime-gpu`, correct installation of the CUDA Toolkit and cuDNN libraries matching your `onnxruntime-gpu` version is crucial. Mismatched versions are a frequent cause of 'DLL not found' or 'Failed to create session' errors.
- deprecated The `generate()` API for generative AI models underwent breaking changes from ONNX Runtime GenAI 0.5.2 to 0.6.0, notably replacing `params.input_ids = input_tokens` with `generator.append_tokens(input_tokens)` and removing `generator.compute_logits()`.
- gotcha For pre- and post-processing steps using custom ONNX operators, the `onnxruntime-extensions` package must be separately installed and its custom operators registered with the `InferenceSession` via `session_options.register_custom_ops_library()`.
- breaking `onnxruntime` relies heavily on pre-built wheels. `pip` may fail to find a matching distribution if wheels are not available for your specific Python version (e.g., Python 3.13+) or operating system/architecture (e.g., Alpine Linux which uses musl libc, or ARM64 where fewer wheels are available). Trying to build from source can be complex and often fails.
Install
-
pip install onnxruntime -
pip install onnxruntime-gpu -
pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-11/pypi/simple/
Imports
- InferenceSession
from onnxruntime import InferenceSession
- SessionOptions
from onnxruntime import SessionOptions
- get_available_providers
from onnxruntime import get_available_providers
- PyOrtFunction
from onnxruntime_extensions import PyOrtFunction
Quickstart
import onnxruntime as ort
import numpy as np
import os
# NOTE: This example assumes you have an ONNX model file named 'model.onnx'.
# You can typically export models from frameworks like PyTorch or TensorFlow to ONNX format.
# For a runnable example, you'd need to create a dummy model:
# e.g., using ONNX library:
# import onnx
# from onnx import TensorProto
# from onnx.helper import make_model, make_node, make_graph, make_tensor_value_info
# X = make_tensor_value_info('input', TensorProto.FLOAT, [None, 2])
# Y = make_tensor_value_info('output', TensorProto.FLOAT, [None, 2])
# node = make_node('Add', ['input', 'input'], ['output'])
# graph = make_graph([node], 'simple-graph', [X], [Y])
# onnx_model = make_model(graph)
# onnx.save(onnx_model, 'model.onnx')
model_path = os.environ.get('ONNX_MODEL_PATH', 'model.onnx')
# 1. Create an InferenceSession
# For GPU, add providers=['CUDAExecutionProvider', 'CPUExecutionProvider']
session = ort.InferenceSession(model_path, providers=ort.get_available_providers())
print("Model inputs:")
for input_meta in session.get_inputs():
print(f" Name: {input_meta.name}, Shape: {input_meta.shape}, Type: {input_meta.type}")
print("\nModel outputs:")
for output_meta in session.get_outputs():
print(f" Name: {output_meta.name}, Shape: {output_meta.shape}, Type: {output_meta.type}")
# 2. Prepare input data (example for a model expecting a float32 array)
# Assuming the first input expects a 2D float32 array, e.g., shape (1, 2)
input_name = session.get_inputs()[0].name
input_shape = [dim if isinstance(dim, int) else 1 for dim in session.get_inputs()[0].shape] # Handle dynamic shapes
input_data = np.random.randn(*input_shape).astype(np.float32)
# 3. Run inference
outputs = session.run(None, {input_name: input_data})
# 4. Process outputs
print(f"\nOutput data type: {outputs[0].dtype}")
print(f"Output shape: {outputs[0].shape}")
print(f"Output data (first 5 elements): {outputs[0].flatten()[:5]}")