ONNX Runtime GenAI
JSON →ONNX Runtime GenAI is a Python library that provides an easy, flexible, and performant way to run Generative AI models (Large Language Models and multi-modal models) on-device and in the cloud using ONNX Runtime. It encapsulates the complete generative AI loop, including pre- and post-processing, inference with ONNX Runtime, logits processing, search and sampling, and KV cache management. The library is actively developed, with version 0.13.1 released in April 2026, generally following a quarterly release cadence in line with the broader ONNX Runtime project.
Traffic · last 30 days ↑40% vs prev 7d
total hits 17
actors 8 distinct systems
last hit 22h ago AhrefsBot
top countries 🇺🇸 United States · 🇨🇦 Canada · 🇫🇷 France · 🇩🇪 Germany · 🇸🇬 Singapore
API endpoints
full doc /v1/registry/onnxruntime-genai
compatibility /v1/registry/onnxruntime-genai/compatibility