ONNX Runtime

raw JSON →
1.24.4 verified Tue May 12 auth: no python install: draft quickstart: stale

ONNX Runtime is a cross-platform, high-performance machine learning inference and training accelerator. It enables faster customer experiences and lower costs by supporting models from various deep learning frameworks (e.g., PyTorch, TensorFlow/Keras) and classical ML libraries (e.g., scikit-learn). The library is actively maintained with new releases approximately quarterly, including patch releases, and commits to backwards compatibility.

pip install onnxruntime
error ModuleNotFoundError: No module named 'onnxruntime'
cause The 'onnxruntime' module is not installed in the Python environment.
fix
Install the module using pip: 'pip install onnxruntime'.
error AttributeError: module 'onnxruntime' has no attribute 'InferenceSession'
cause The 'onnxruntime' module is either not installed correctly or the script's filename conflicts with the module name.
fix
Ensure 'onnxruntime' is installed correctly and rename any script named 'onnxruntime.py' to avoid conflicts.
error AttributeError: module 'onnxruntime' has no attribute 'OrtValue'
cause The 'OrtValue' attribute is not available in the installed version of 'onnxruntime'.
fix
Upgrade 'onnxruntime' to the latest version using pip: 'pip install --upgrade onnxruntime'.
error AttributeError: module 'onnxruntime' has no attribute 'SessionOptions'
cause The 'SessionOptions' attribute is not available in the installed version of 'onnxruntime'.
fix
Upgrade 'onnxruntime' to the latest version using pip: 'pip install --upgrade onnxruntime'.
error [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type. Actual: (tensor(double)) , expected: (tensor(float)).
cause The input data provided to the ONNX Runtime session has a data type (e.g., `numpy.float64`) that does not match the expected data type of the ONNX model (e.g., `numpy.float32`).
fix
Convert your input data to the expected data type, typically numpy.float32, before feeding it to the ONNX Runtime session, e.g., input_data.astype(numpy.float32).
breaking The `onnxruntime` and `onnxruntime-gpu` packages are mutually exclusive. Only one should be installed in a given Python environment. Installing both can lead to unexpected behavior or errors.
fix Uninstall any conflicting packages (`pip uninstall onnxruntime onnxruntime-gpu`) before installing the desired version.
breaking Since ONNX Runtime 1.10, execution providers (like CUDAExecutionProvider for GPU) must be explicitly specified when creating an `InferenceSession`. If not specified, it defaults to `CPUExecutionProvider` only. Older code that relied on implicit GPU usage will break or silently fall back to CPU.
fix Pass a list of providers to `ort.InferenceSession(model_path, providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])`. Use `ort.get_available_providers()` to see what's available.
gotcha Input data shapes and data types must precisely match the ONNX model's expected inputs, otherwise `onnxruntime` will raise `INVALID_ARGUMENT` errors. Common mistakes include incorrect dimensions or using `float64` instead of the expected `float32`.
fix Inspect `session.get_inputs()` to verify expected `shape` and `type`. Ensure NumPy arrays are created with the correct `dtype` (e.g., `np.float32`). Reshape inputs using `np.reshape` or `np.expand_dims` as needed.
gotcha When using `onnxruntime-gpu`, correct installation of the CUDA Toolkit and cuDNN libraries matching your `onnxruntime-gpu` version is crucial. Mismatched versions are a frequent cause of 'DLL not found' or 'Failed to create session' errors.
fix Consult the ONNX Runtime documentation for the exact CUDA/cuDNN version requirements for your `onnxruntime-gpu` package. Ensure they are installed and discoverable in your system's PATH/LD_LIBRARY_PATH.
deprecated The `generate()` API for generative AI models underwent breaking changes from ONNX Runtime GenAI 0.5.2 to 0.6.0, notably replacing `params.input_ids = input_tokens` with `generator.append_tokens(input_tokens)` and removing `generator.compute_logits()`.
fix Update `onnxruntime-genai` code to use the new `append_tokens` method and remove `compute_logits` calls, as detailed in the migration guide.
gotcha For pre- and post-processing steps using custom ONNX operators, the `onnxruntime-extensions` package must be separately installed and its custom operators registered with the `InferenceSession` via `session_options.register_custom_ops_library()`.
fix Install `pip install onnxruntime-extensions` and use `so = ort.SessionOptions(); so.register_custom_ops_library(get_library_path()); sess = ort.InferenceSession(model, sess_options=so)` where `get_library_path` comes from `onnxruntime_extensions`.
breaking `onnxruntime` relies heavily on pre-built wheels. `pip` may fail to find a matching distribution if wheels are not available for your specific Python version (e.g., Python 3.13+) or operating system/architecture (e.g., Alpine Linux which uses musl libc, or ARM64 where fewer wheels are available). Trying to build from source can be complex and often fails.
fix Use a supported environment: try a stable Python version (e.g., 3.8-3.11) and a common Linux distribution (e.g., Debian/Ubuntu based with glibc). Check the official ONNX Runtime documentation for supported environments and available wheels.
pip install onnxruntime-gpu
pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-11/pypi/simple/
python os / libc variant status wheel install import disk
3.10 alpine (musl) onnxruntime build_error - - - -
3.10 alpine (musl) onnxruntime-gpu build_error - - - -
3.10 alpine (musl) onnxruntime-gpu build_error - - - -
3.10 alpine (musl) onnxruntime - - - -
3.10 alpine (musl) onnxruntime-gpu - - - -
3.10 alpine (musl) onnxruntime-gpu - - - -
3.10 slim (glibc) onnxruntime wheel 9.9s 0.26s 201M
3.10 slim (glibc) onnxruntime-gpu wheel 18.2s 0.16s 1.1G
3.10 slim (glibc) onnxruntime-gpu wheel 15.3s 0.28s 1.1G
3.10 slim (glibc) onnxruntime - - 0.17s 201M
3.10 slim (glibc) onnxruntime-gpu - - 0.21s 1.1G
3.10 slim (glibc) onnxruntime-gpu - - 0.20s 1.1G
3.11 alpine (musl) onnxruntime build_error - - - -
3.11 alpine (musl) onnxruntime-gpu build_error - - - -
3.11 alpine (musl) onnxruntime-gpu build_error - - - -
3.11 alpine (musl) onnxruntime - - - -
3.11 alpine (musl) onnxruntime-gpu - - - -
3.11 alpine (musl) onnxruntime-gpu - - - -
3.11 slim (glibc) onnxruntime wheel 4.9s 0.35s 151M
3.11 slim (glibc) onnxruntime-gpu wheel 12.1s 0.28s 939M
3.11 slim (glibc) onnxruntime-gpu wheel 11.1s 0.37s 939M
3.11 slim (glibc) onnxruntime - - 0.30s 234M
3.11 slim (glibc) onnxruntime-gpu - - 0.30s 973M
3.11 slim (glibc) onnxruntime-gpu - - 0.29s 973M
3.12 alpine (musl) onnxruntime build_error - - - -
3.12 alpine (musl) onnxruntime-gpu build_error - - - -
3.12 alpine (musl) onnxruntime-gpu build_error - - - -
3.12 alpine (musl) onnxruntime - - - -
3.12 alpine (musl) onnxruntime-gpu - - - -
3.12 alpine (musl) onnxruntime-gpu - - - -
3.12 slim (glibc) onnxruntime wheel 4.7s 0.36s 139M
3.12 slim (glibc) onnxruntime-gpu wheel 12.2s 0.29s 926M
3.12 slim (glibc) onnxruntime-gpu wheel 10.2s 0.33s 925M
3.12 slim (glibc) onnxruntime - - 0.31s 216M
3.12 slim (glibc) onnxruntime-gpu - - 0.30s 954M
3.12 slim (glibc) onnxruntime-gpu - - 0.31s 954M
3.13 alpine (musl) onnxruntime build_error - - - -
3.13 alpine (musl) onnxruntime-gpu build_error - - - -
3.13 alpine (musl) onnxruntime-gpu build_error - - - -
3.13 alpine (musl) onnxruntime - - - -
3.13 alpine (musl) onnxruntime-gpu - - - -
3.13 alpine (musl) onnxruntime-gpu - - - -
3.13 slim (glibc) onnxruntime wheel 4.8s 0.33s 139M
3.13 slim (glibc) onnxruntime-gpu wheel 11.7s 0.33s 924M
3.13 slim (glibc) onnxruntime-gpu wheel 9.9s 0.28s 924M
3.13 slim (glibc) onnxruntime - - 0.31s 215M
3.13 slim (glibc) onnxruntime-gpu - - 0.33s 951M
3.13 slim (glibc) onnxruntime-gpu - - 0.31s 952M
3.9 alpine (musl) onnxruntime build_error - - - -
3.9 alpine (musl) onnxruntime-gpu build_error - - - -
3.9 alpine (musl) onnxruntime-gpu build_error - - - -
3.9 alpine (musl) onnxruntime - - - -
3.9 alpine (musl) onnxruntime-gpu - - - -
3.9 alpine (musl) onnxruntime-gpu - - - -
3.9 slim (glibc) onnxruntime wheel 11.0s 0.28s 199M
3.9 slim (glibc) onnxruntime-gpu wheel 58.1s 0.20s 1.2G
3.9 slim (glibc) onnxruntime-gpu wheel 16.5s 0.27s 1.2G
3.9 slim (glibc) onnxruntime - - 0.19s 199M
3.9 slim (glibc) onnxruntime-gpu - - 0.19s 1.2G
3.9 slim (glibc) onnxruntime-gpu - - 0.19s 1.2G

This quickstart demonstrates how to load an ONNX model, inspect its inputs and outputs, prepare sample input data using NumPy, and perform inference using `onnxruntime.InferenceSession`. It also shows how to configure execution providers for CPU or GPU inference. A `model.onnx` file is required for this code to run; a comment in the code suggests how to create a simple dummy model using the `onnx` library.

import onnxruntime as ort
import numpy as np
import os

# NOTE: This example assumes you have an ONNX model file named 'model.onnx'.
# You can typically export models from frameworks like PyTorch or TensorFlow to ONNX format.
# For a runnable example, you'd need to create a dummy model:
# e.g., using ONNX library:
# import onnx
# from onnx import TensorProto
# from onnx.helper import make_model, make_node, make_graph, make_tensor_value_info
# X = make_tensor_value_info('input', TensorProto.FLOAT, [None, 2])
# Y = make_tensor_value_info('output', TensorProto.FLOAT, [None, 2])
# node = make_node('Add', ['input', 'input'], ['output'])
# graph = make_graph([node], 'simple-graph', [X], [Y])
# onnx_model = make_model(graph)
# onnx.save(onnx_model, 'model.onnx')

model_path = os.environ.get('ONNX_MODEL_PATH', 'model.onnx')

# 1. Create an InferenceSession
# For GPU, add providers=['CUDAExecutionProvider', 'CPUExecutionProvider']
session = ort.InferenceSession(model_path, providers=ort.get_available_providers())

print("Model inputs:")
for input_meta in session.get_inputs():
    print(f"  Name: {input_meta.name}, Shape: {input_meta.shape}, Type: {input_meta.type}")

print("\nModel outputs:")
for output_meta in session.get_outputs():
    print(f"  Name: {output_meta.name}, Shape: {output_meta.shape}, Type: {output_meta.type}")

# 2. Prepare input data (example for a model expecting a float32 array)
# Assuming the first input expects a 2D float32 array, e.g., shape (1, 2)
input_name = session.get_inputs()[0].name
input_shape = [dim if isinstance(dim, int) else 1 for dim in session.get_inputs()[0].shape] # Handle dynamic shapes
input_data = np.random.randn(*input_shape).astype(np.float32)

# 3. Run inference
outputs = session.run(None, {input_name: input_data})

# 4. Process outputs
print(f"\nOutput data type: {outputs[0].dtype}")
print(f"Output shape: {outputs[0].shape}")
print(f"Output data (first 5 elements): {outputs[0].flatten()[:5]}")