ONNX Runtime

1.24.4 · active · verified Sat Mar 28

ONNX Runtime is a cross-platform, high-performance machine learning inference and training accelerator. It enables faster customer experiences and lower costs by supporting models from various deep learning frameworks (e.g., PyTorch, TensorFlow/Keras) and classical ML libraries (e.g., scikit-learn). The library is actively maintained with new releases approximately quarterly, including patch releases, and commits to backwards compatibility.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to load an ONNX model, inspect its inputs and outputs, prepare sample input data using NumPy, and perform inference using `onnxruntime.InferenceSession`. It also shows how to configure execution providers for CPU or GPU inference. A `model.onnx` file is required for this code to run; a comment in the code suggests how to create a simple dummy model using the `onnx` library.

import onnxruntime as ort
import numpy as np
import os

# NOTE: This example assumes you have an ONNX model file named 'model.onnx'.
# You can typically export models from frameworks like PyTorch or TensorFlow to ONNX format.
# For a runnable example, you'd need to create a dummy model:
# e.g., using ONNX library:
# import onnx
# from onnx import TensorProto
# from onnx.helper import make_model, make_node, make_graph, make_tensor_value_info
# X = make_tensor_value_info('input', TensorProto.FLOAT, [None, 2])
# Y = make_tensor_value_info('output', TensorProto.FLOAT, [None, 2])
# node = make_node('Add', ['input', 'input'], ['output'])
# graph = make_graph([node], 'simple-graph', [X], [Y])
# onnx_model = make_model(graph)
# onnx.save(onnx_model, 'model.onnx')

model_path = os.environ.get('ONNX_MODEL_PATH', 'model.onnx')

# 1. Create an InferenceSession
# For GPU, add providers=['CUDAExecutionProvider', 'CPUExecutionProvider']
session = ort.InferenceSession(model_path, providers=ort.get_available_providers())

print("Model inputs:")
for input_meta in session.get_inputs():
    print(f"  Name: {input_meta.name}, Shape: {input_meta.shape}, Type: {input_meta.type}")

print("\nModel outputs:")
for output_meta in session.get_outputs():
    print(f"  Name: {output_meta.name}, Shape: {output_meta.shape}, Type: {output_meta.type}")

# 2. Prepare input data (example for a model expecting a float32 array)
# Assuming the first input expects a 2D float32 array, e.g., shape (1, 2)
input_name = session.get_inputs()[0].name
input_shape = [dim if isinstance(dim, int) else 1 for dim in session.get_inputs()[0].shape] # Handle dynamic shapes
input_data = np.random.randn(*input_shape).astype(np.float32)

# 3. Run inference
outputs = session.run(None, {input_name: input_data})

# 4. Process outputs
print(f"\nOutput data type: {outputs[0].dtype}")
print(f"Output shape: {outputs[0].shape}")
print(f"Output data (first 5 elements): {outputs[0].flatten()[:5]}")

view raw JSON →