{"id":2143,"library":"onnxruntime-gpu","title":"ONNX Runtime (GPU)","description":"ONNX Runtime is a high-performance inference engine for ONNX models. The `onnxruntime-gpu` package provides GPU acceleration (e.g., via CUDA, ROCm) for these models, building on the core ONNX Runtime. It's actively developed by Microsoft, with frequent releases often aligned with new ONNX operator sets and performance improvements, currently at version 1.24.4.","status":"active","version":"1.24.4","language":"en","source_language":"en","source_url":"https://github.com/microsoft/onnxruntime","tags":["AI","ML","inference","deep learning","GPU","ONNX","CUDA","ROCm"],"install":[{"cmd":"pip install onnxruntime-gpu","lang":"bash","label":"Install for CUDA-enabled GPUs"}],"dependencies":[],"imports":[{"note":"The core library is imported as 'onnxruntime', even when using the GPU-specific package 'onnxruntime-gpu'.","wrong":"import onnxruntime_gpu","symbol":"InferenceSession","correct":"import onnxruntime as ort\nsession = ort.InferenceSession(...)"}],"quickstart":{"code":"import onnxruntime as ort\nimport numpy as np\nimport onnx\nfrom onnx import helper, TensorProto\nimport os\n\n# 1. Create a dummy ONNX model for demonstration\n# Define the graph (input, output, and node)\nX = helper.make_tensor_value_info('X', TensorProto.FLOAT, [None, 3])\nY = helper.make_tensor_value_info('Y', TensorProto.FLOAT, [None, 3])\nnode = helper.make_node('Relu', ['X'], ['Y'])\ngraph = helper.make_graph([node], 'simple_relu', [X], [Y])\nmodel = helper.make_model(graph, producer_name='onnx-example')\n\n# Save it to a temporary file\nmodel_path = \"simple_relu.onnx\"\nonnx.save(model, model_path)\n\n# 2. Load the model with GPU provider\ntry:\n    # Prioritize CUDAExecutionProvider for NVIDIA GPUs\n    # Fallback to CPUExecutionProvider if CUDA is not available or fails\n    session = ort.InferenceSession(\n        model_path,\n        providers=[\"CUDAExecutionProvider\", \"CPUExecutionProvider\"]\n    )\n    print(\"ONNX Runtime session created with providers:\", session.get_providers())\n    \n    # Prepare dummy input data\n    input_data = np.random.rand(1, 3).astype(np.float32)\n    \n    # Run inference\n    output = session.run(None, {'X': input_data})\n    print(\"Inference successful. Output shape:\", output[0].shape)\n\nexcept Exception as e:\n    print(f\"\\nError creating ONNX Runtime session or running inference: {e}\")\n    print(\"Make sure you have a compatible CUDA environment (or other GPU runtime) \")\n    print(\"and the correct onnxruntime-gpu package installed. \\n\")\n    print(\"If CUDA is not available, try removing 'CUDAExecutionProvider' from the providers list.\")\n\nfinally:\n    # Clean up the dummy model file\n    if os.path.exists(model_path):\n        os.remove(model_path)\n","lang":"python","description":"This quickstart demonstrates how to create a simple ONNX model, save it, and then load it into an `InferenceSession` configured to prioritize GPU (CUDA) execution. It includes error handling for common GPU setup issues."},"warnings":[{"fix":"Consult the official ONNX Runtime documentation (e.g., 'Build ONNX Runtime from source' or release notes) for the exact CUDA/cuDNN versions compatible with your `onnxruntime-gpu` version and ensure they are correctly installed and configured in your system environment (e.g., `PATH`, `LD_LIBRARY_PATH`).","message":"The `onnxruntime-gpu` package requires a specific CUDA Toolkit and cuDNN version to be installed on your system. Mismatched versions are a very common cause of `InferenceSession` initialization failures or runtime errors.","severity":"gotcha","affected_versions":"All `onnxruntime-gpu` versions"},{"fix":"Always pass `providers=[\"CUDAExecutionProvider\", \"CPUExecutionProvider\"]` (or `ROCMExecutionProvider` for AMD GPUs) to `onnxruntime.InferenceSession()` to prioritize GPU and gracefully fall back to CPU if GPU isn't available or fails.","message":"When using `onnxruntime-gpu`, you must explicitly specify execution providers like `['CUDAExecutionProvider', 'CPUExecutionProvider']` during `InferenceSession` creation to ensure GPU acceleration is attempted. If not specified, ONNX Runtime might default to CPU execution even with the GPU package installed.","severity":"gotcha","affected_versions":"All `onnxruntime-gpu` versions"},{"fix":"Before installing `onnxruntime-gpu`, uninstall `onnxruntime` if it was previously installed (`pip uninstall onnxruntime`). Verify with `pip freeze | grep onnxruntime` that only the desired package is present.","message":"There are two main PyPI packages: `onnxruntime` (CPU-only) and `onnxruntime-gpu` (GPU-enabled). Installing `onnxruntime-gpu` does *not* automatically remove `onnxruntime`. If both are installed, `onnxruntime` might be used by default or cause conflicts, leading to unexpected CPU-only execution.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Upgrade your Python environment to 3.11 or newer. If you must use an older Python version, install an older compatible `onnxruntime-gpu` version (e.g., `pip install onnxruntime-gpu<1.17` for Python 3.10 compatibility, but be aware of security and feature limitations).","message":"Starting with ONNX Runtime version 1.17, official support for Python 3.8 and 3.9 was dropped. Version 1.24.0 and later also dropped support for Python 3.10. The current version (1.24.4) explicitly requires Python >= 3.11.","severity":"breaking","affected_versions":">= 1.17.0 (Python 3.8/3.9), >= 1.24.0 (Python 3.10)"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}