ONNX Runtime (GPU)

1.24.4 · active · verified Thu Apr 09

ONNX Runtime is a high-performance inference engine for ONNX models. The `onnxruntime-gpu` package provides GPU acceleration (e.g., via CUDA, ROCm) for these models, building on the core ONNX Runtime. It's actively developed by Microsoft, with frequent releases often aligned with new ONNX operator sets and performance improvements, currently at version 1.24.4.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create a simple ONNX model, save it, and then load it into an `InferenceSession` configured to prioritize GPU (CUDA) execution. It includes error handling for common GPU setup issues.

import onnxruntime as ort
import numpy as np
import onnx
from onnx import helper, TensorProto
import os

# 1. Create a dummy ONNX model for demonstration
# Define the graph (input, output, and node)
X = helper.make_tensor_value_info('X', TensorProto.FLOAT, [None, 3])
Y = helper.make_tensor_value_info('Y', TensorProto.FLOAT, [None, 3])
node = helper.make_node('Relu', ['X'], ['Y'])
graph = helper.make_graph([node], 'simple_relu', [X], [Y])
model = helper.make_model(graph, producer_name='onnx-example')

# Save it to a temporary file
model_path = "simple_relu.onnx"
onnx.save(model, model_path)

# 2. Load the model with GPU provider
try:
    # Prioritize CUDAExecutionProvider for NVIDIA GPUs
    # Fallback to CPUExecutionProvider if CUDA is not available or fails
    session = ort.InferenceSession(
        model_path,
        providers=["CUDAExecutionProvider", "CPUExecutionProvider"]
    )
    print("ONNX Runtime session created with providers:", session.get_providers())
    
    # Prepare dummy input data
    input_data = np.random.rand(1, 3).astype(np.float32)
    
    # Run inference
    output = session.run(None, {'X': input_data})
    print("Inference successful. Output shape:", output[0].shape)

except Exception as e:
    print(f"\nError creating ONNX Runtime session or running inference: {e}")
    print("Make sure you have a compatible CUDA environment (or other GPU runtime) ")
    print("and the correct onnxruntime-gpu package installed. \n")
    print("If CUDA is not available, try removing 'CUDAExecutionProvider' from the providers list.")

finally:
    # Clean up the dummy model file
    if os.path.exists(model_path):
        os.remove(model_path)

view raw JSON →