{"library":"onnxruntime-genai","title":"ONNX Runtime GenAI","description":"ONNX Runtime GenAI is a Python library that provides an easy, flexible, and performant way to run Generative AI models (Large Language Models and multi-modal models) on-device and in the cloud using ONNX Runtime. It encapsulates the complete generative AI loop, including pre- and post-processing, inference with ONNX Runtime, logits processing, search and sampling, and KV cache management. The library is actively developed, with version 0.13.1 released in April 2026, generally following a quarterly release cadence in line with the broader ONNX Runtime project.","language":"python","status":"active","last_verified":"Sun May 17","install":{"commands":["pip install onnxruntime-genai","pip install onnxruntime-genai-directml","pip install onnxruntime-genai-cuda"],"cli":null},"imports":["import onnxruntime_genai as og"],"auth":{"required":false,"env_vars":[]},"quickstart":{"code":"import os\nimport onnxruntime_genai as og\n\n# --- Prerequisite: Download a model ---\n# The following shell command downloads the Phi-3 Mini 4K Instruct ONNX model (CPU-INT4 quantized).\n# You will need to install huggingface_hub: pip install huggingface_hub\n# Run this command in your terminal before executing the Python code:\n# huggingface-cli download microsoft/Phi-3-mini-4k-instruct-onnx \\\n#   --include cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/* \\\n#   --local-dir ./phi-3-mini-onnx\n\nmodel_path = os.environ.get('ONNX_MODEL_PATH', './phi-3-mini-onnx')\n\ntry:\n    # 1. Load the model\n    model = og.Model(model_path)\n    print(f\"Loaded {model.type} on {model.device_type}\")\n\n    # 2. Create a tokenizer\n    tokenizer = og.Tokenizer(model)\n\n    # 3. Create generator parameters\n    params = og.GeneratorParams(model)\n    params.set_search_options(max_length=200, top_p=0.9, temperature=0.7)\n\n    # 4. Encode initial prompt and append to generator\n    prompt = \"The capital of France is\"\n    input_tokens = tokenizer.encode(prompt)\n\n    # 5. Create a generator instance\n    generator = og.Generator(model, params)\n    generator.append_tokens(input_tokens)\n\n    print(f\"Prompt: {prompt}\")\n    print(\"Generated text:\", end=\"\")\n\n    # 6. Generate tokens one by one and decode for streaming output\n    while not generator.is_done():\n        generator.generate_next_token()\n        last_token = generator.get_sequence(0)[-1]\n        print(tokenizer.decode([last_token]), end=\"\", flush=True)\n    print()\n\n    # Get the full decoded sequence (optional, for non-streaming output)\n    # output = tokenizer.decode(generator.get_sequence(0))\n    # print(f\"\\nFull output: {output}\")\n\nexcept Exception as e:\n    print(f\"An error occurred: {e}\")\n    print(f\"Please ensure the model is downloaded to '{model_path}' and all dependencies are installed.\")","lang":"python","description":"This quickstart demonstrates how to load a pre-optimized ONNX model (like Phi-3 Mini), tokenize an input prompt, and generate text using the `onnxruntime-genai` library. Before running the Python code, you must download an ONNX model, typically using `huggingface-cli` into a local directory. The example uses environment variables for the model path for flexibility.","tag":null,"tag_description":null,"last_tested":null,"results":[]},"compatibility":{"tag":null,"tag_description":null,"last_tested":"2026-05-17","installed_version":"0.4.0","pypi_latest":"0.13.2","is_stale":true,"summary":{"python_range":"3.10–3.9","success_rate":33,"avg_install_s":14.8,"avg_import_s":0.03,"wheel_type":"wheel"},"results":[{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"onnxruntime-genai","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"onnxruntime-genai-cuda","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"onnxruntime-genai-directml","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"onnxruntime-genai","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"clean","install_time_s":12.1,"import_time_s":0.02,"mem_mb":1.1,"disk_size":"364M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"onnxruntime-genai-cuda","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"clean","install_time_s":27.4,"import_time_s":0.02,"mem_mb":1.1,"disk_size":"1.4G"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"onnxruntime-genai-directml","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":1.5,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"onnxruntime-genai","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"onnxruntime-genai-cuda","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"onnxruntime-genai-directml","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"onnxruntime-genai","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"clean","install_time_s":10,"import_time_s":0.03,"mem_mb":1.1,"disk_size":"332M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"onnxruntime-genai-cuda","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"clean","install_time_s":18.1,"import_time_s":0.03,"mem_mb":1.1,"disk_size":"1.3G"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"onnxruntime-genai-directml","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":1.5,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"onnxruntime-genai","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"onnxruntime-genai-cuda","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"onnxruntime-genai-directml","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"onnxruntime-genai","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"clean","install_time_s":8.1,"import_time_s":0.03,"mem_mb":1.2,"disk_size":"320M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"onnxruntime-genai-cuda","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"clean","install_time_s":20.8,"import_time_s":0.03,"mem_mb":1.2,"disk_size":"1.3G"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"onnxruntime-genai-directml","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":1.4,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"onnxruntime-genai","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"onnxruntime-genai-cuda","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"onnxruntime-genai-directml","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"onnxruntime-genai","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"clean","install_time_s":6.7,"import_time_s":0.03,"mem_mb":1.3,"disk_size":"320M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"onnxruntime-genai-cuda","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"clean","install_time_s":13.4,"import_time_s":0.03,"mem_mb":1.3,"disk_size":"1.3G"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"onnxruntime-genai-directml","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":1.3,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"onnxruntime-genai","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"onnxruntime-genai-cuda","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"onnxruntime-genai-directml","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"onnxruntime-genai","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"clean","install_time_s":11.4,"import_time_s":0.08,"mem_mb":1.6,"disk_size":"203M"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"onnxruntime-genai-cuda","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":19.5,"import_time_s":null,"mem_mb":null,"disk_size":"1.2G"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"onnxruntime-genai-directml","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":1.7,"import_time_s":null,"mem_mb":null,"disk_size":null}]}}