ONNX Runtime (GPU)
ONNX Runtime is a high-performance inference engine for ONNX models. The `onnxruntime-gpu` package provides GPU acceleration (e.g., via CUDA, ROCm) for these models, building on the core ONNX Runtime. It's actively developed by Microsoft, with frequent releases often aligned with new ONNX operator sets and performance improvements, currently at version 1.24.4.
Common errors
-
LoadLibrary failed with error 126 "" when trying to load "onnxruntime_providers_cuda.dll"
cause This error, or similar ones like 'Failed to create CUDAExecutionProvider. Require cuDNN X and CUDA Y' or 'libcublasLt.so.11: cannot open shared object file: No such file or directory', typically indicates a mismatch between the CUDA Toolkit and cuDNN versions installed on your system and those against which your specific `onnxruntime-gpu` package was built, or that the necessary CUDA/cuDNN libraries are not found in your system's PATH (Windows) or LD_LIBRARY_PATH (Linux) environment variables.fixConsult the official ONNX Runtime documentation (e.g., onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements) for the exact CUDA and cuDNN versions compatible with your `onnxruntime-gpu` version. Ensure these compatible versions are installed, and their respective `bin` (Windows) or `lib64` (Linux) directories are correctly added to your system's environment variables. If using PyTorch, importing `torch` before `onnxruntime` can sometimes help preload necessary CUDA DLLs. -
ModuleNotFoundError: No module named 'onnxruntime'
cause The `onnxruntime` or `onnxruntime-gpu` package is not installed in the Python environment you are currently using, or there is a typo in the import statement.fixInstall the correct package using pip: `pip install onnxruntime-gpu` (for GPU support) or `pip install onnxruntime` (for CPU-only). Ensure you are installing it within the active Python environment or virtual environment where your script is being executed. -
Specified provider 'CUDAExecutionProvider' is not in available provider names. Available providers: ['CPUExecutionProvider']
cause This error occurs when you attempt to use the `CUDAExecutionProvider` but `onnxruntime-gpu` either failed to initialize its CUDA dependencies, or there's a conflict, most commonly caused by having both `onnxruntime` (the CPU-only package) and `onnxruntime-gpu` installed simultaneously.fixUninstall both `onnxruntime` and `onnxruntime-gpu` packages: `pip uninstall onnxruntime onnxruntime-gpu`. Then, reinstall *only* the GPU-enabled package: `pip install onnxruntime-gpu`. Afterwards, verify your CUDA and cuDNN installations and environment variables are correctly set up as per ONNX Runtime's requirements. -
[ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Invalid rank for input: X Got: Y Expected: Z
cause This error indicates that the input tensor provided to the ONNX model during inference does not match the expected shape (rank or dimensions) or data type defined in the ONNX model's graph.fixInspect your ONNX model to understand its expected input shapes and data types. Adjust the NumPy array (or equivalent) you are passing to the `session.run()` method to precisely match the model's input requirements. You can use tools like `netron` to visualize the ONNX model and its input specifications.
Warnings
- gotcha The `onnxruntime-gpu` package requires a specific CUDA Toolkit and cuDNN version to be installed on your system. Mismatched versions are a very common cause of `InferenceSession` initialization failures or runtime errors.
- gotcha When using `onnxruntime-gpu`, you must explicitly specify execution providers like `['CUDAExecutionProvider', 'CPUExecutionProvider']` during `InferenceSession` creation to ensure GPU acceleration is attempted. If not specified, ONNX Runtime might default to CPU execution even with the GPU package installed.
- gotcha There are two main PyPI packages: `onnxruntime` (CPU-only) and `onnxruntime-gpu` (GPU-enabled). Installing `onnxruntime-gpu` does *not* automatically remove `onnxruntime`. If both are installed, `onnxruntime` might be used by default or cause conflicts, leading to unexpected CPU-only execution.
- breaking Starting with ONNX Runtime version 1.17, official support for Python 3.8 and 3.9 was dropped. Version 1.24.0 and later also dropped support for Python 3.10. The current version (1.24.4) explicitly requires Python >= 3.11.
Install
-
pip install onnxruntime-gpu
Imports
- InferenceSession
import onnxruntime_gpu
import onnxruntime as ort session = ort.InferenceSession(...)
Quickstart
import onnxruntime as ort
import numpy as np
import onnx
from onnx import helper, TensorProto
import os
# 1. Create a dummy ONNX model for demonstration
# Define the graph (input, output, and node)
X = helper.make_tensor_value_info('X', TensorProto.FLOAT, [None, 3])
Y = helper.make_tensor_value_info('Y', TensorProto.FLOAT, [None, 3])
node = helper.make_node('Relu', ['X'], ['Y'])
graph = helper.make_graph([node], 'simple_relu', [X], [Y])
model = helper.make_model(graph, producer_name='onnx-example')
# Save it to a temporary file
model_path = "simple_relu.onnx"
onnx.save(model, model_path)
# 2. Load the model with GPU provider
try:
# Prioritize CUDAExecutionProvider for NVIDIA GPUs
# Fallback to CPUExecutionProvider if CUDA is not available or fails
session = ort.InferenceSession(
model_path,
providers=["CUDAExecutionProvider", "CPUExecutionProvider"]
)
print("ONNX Runtime session created with providers:", session.get_providers())
# Prepare dummy input data
input_data = np.random.rand(1, 3).astype(np.float32)
# Run inference
output = session.run(None, {'X': input_data})
print("Inference successful. Output shape:", output[0].shape)
except Exception as e:
print(f"\nError creating ONNX Runtime session or running inference: {e}")
print("Make sure you have a compatible CUDA environment (or other GPU runtime) ")
print("and the correct onnxruntime-gpu package installed. \n")
print("If CUDA is not available, try removing 'CUDAExecutionProvider' from the providers list.")
finally:
# Clean up the dummy model file
if os.path.exists(model_path):
os.remove(model_path)