{"library":"optimum-onnx","title":"Optimum ONNX","description":"Optimum ONNX is a specialized extension of the Hugging Face Optimum library, providing a streamlined interface for exporting Hugging Face Transformer models (and other architectures like Diffusers, Timm, Sentence Transformers) to the ONNX format. It facilitates efficient inference and deployment using ONNX Runtime, including features like graph optimization and quantization. Currently at version 0.1.0, it sees regular updates to support new Hugging Face models and ensure compatibility with underlying libraries like PyTorch and Transformers.","language":"python","status":"active","last_verified":"Sun May 17","install":{"commands":["pip install optimum-onnx","pip install \"optimum-onnx[onnxruntime]\"","pip install \"optimum-onnx[onnxruntime-gpu]\""],"cli":null},"imports":["from optimum.onnxruntime import ORTModelForSequenceClassification","from optimum.onnxruntime import pipeline","from transformers import AutoTokenizer","from optimum.onnxruntime.configuration import AutoQuantizationConfig"],"auth":{"required":false,"env_vars":[]},"quickstart":{"code":"import os\nfrom optimum.onnxruntime import ORTModelForSequenceClassification\nfrom optimum.onnxruntime import pipeline as ORTPipeline # Alias to avoid conflict with transformers.pipeline if imported\nfrom transformers import AutoTokenizer\n\nmodel_checkpoint = \"distilbert-base-uncased-finetuned-sst-2-english\"\nsave_directory = \"./tmp/onnx_model\"\n\n# 1. Load a model from transformers and export it to ONNX\nprint(f\"Exporting model {model_checkpoint} to ONNX...\")\nort_model = ORTModelForSequenceClassification.from_pretrained(model_checkpoint, export=True)\ntokenizer = AutoTokenizer.from_pretrained(model_checkpoint)\n\n# 2. Save the ONNX model and tokenizer\nos.makedirs(save_directory, exist_ok=True)\nort_model.save_pretrained(save_directory)\ntokenizer.save_pretrained(save_directory)\nprint(f\"Model and tokenizer saved to {save_directory}\")\n\n# 3. Load the exported ONNX model for inference\nprint(f\"Loading ONNX model from {save_directory} for inference...\")\nloaded_ort_model = ORTModelForSequenceClassification.from_pretrained(save_directory, file_name=\"model.onnx\")\nloaded_tokenizer = AutoTokenizer.from_pretrained(save_directory)\n\n# 4. Run inference using the Optimum ONNX Runtime pipeline\ncls_pipeline = ORTPipeline(\"text-classification\", model=loaded_ort_model, tokenizer=loaded_tokenizer)\nresults = cls_pipeline(\"I love using Hugging Face Optimum ONNX!\")\nprint(f\"Inference result: {results}\")\n\n# Example with a quantized model (if applicable)\n# from optimum.onnxruntime.configuration import AutoQuantizationConfig\n# from optimum.onnxruntime import ORTQuantizer\n# qconfig = AutoQuantizationConfig.arm64(is_static=False, per_channel=False)\n# quantizer = ORTQuantizer.from_pretrained(ort_model)\n# quantizer.quantize(save_dir=save_directory, quantization_config=qconfig)\n# loaded_quantized_model = ORTModelForSequenceClassification.from_pretrained(save_directory, file_name=\"model_quantized.onnx\")\n# cls_pipeline_quant = ORTPipeline(\"text-classification\", model=loaded_quantized_model, tokenizer=loaded_tokenizer)\n# results_quant = cls_pipeline_quant(\"I love using Hugging Face Optimum ONNX with quantization!\")\n# print(f\"Quantized inference result: {results_quant}\")\n","lang":"python","description":"This quickstart demonstrates the core workflow: exporting a Hugging Face model to ONNX using `ORTModelForSequenceClassification.from_pretrained(export=True)`, saving the exported model and tokenizer, then loading the ONNX model and performing inference with the `optimum.onnxruntime.pipeline`.","tag":null,"tag_description":null,"last_tested":null,"results":[]},"compatibility":{"tag":null,"tag_description":null,"last_tested":"2026-05-17","installed_version":"0.1.0","pypi_latest":"0.1.0","is_stale":false,"summary":{"python_range":"3.10–3.9","success_rate":37,"avg_install_s":80.3,"avg_import_s":17.57,"wheel_type":"wheel"},"results":[{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"onnxruntime-gpu","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"onnxruntime","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"optimum-onnx","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"onnxruntime-gpu","exit_code":1,"wheel_type":null,"failure_reason":"timeout","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"onnxruntime","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"clean","install_time_s":90.7,"import_time_s":13.71,"mem_mb":150,"disk_size":"5.0G"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"optimum-onnx","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":88.6,"import_time_s":null,"mem_mb":null,"disk_size":"4.9G"},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"onnxruntime-gpu","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"onnxruntime","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"optimum-onnx","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"onnxruntime-gpu","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"clean","install_time_s":91.1,"import_time_s":18.64,"mem_mb":166.7,"disk_size":"5.4G"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"onnxruntime","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"clean","install_time_s":82.8,"import_time_s":19.15,"mem_mb":166.7,"disk_size":"5.1G"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"optimum-onnx","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":82.1,"import_time_s":null,"mem_mb":null,"disk_size":"5.0G"},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"onnxruntime-gpu","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"onnxruntime","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"optimum-onnx","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"onnxruntime-gpu","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"clean","install_time_s":80.6,"import_time_s":19.4,"mem_mb":162,"disk_size":"5.4G"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"onnxruntime","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"clean","install_time_s":77.3,"import_time_s":18.68,"mem_mb":162,"disk_size":"5.1G"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"optimum-onnx","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":75.5,"import_time_s":null,"mem_mb":null,"disk_size":"5.0G"},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"onnxruntime-gpu","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"onnxruntime","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"optimum-onnx","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"onnxruntime-gpu","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"clean","install_time_s":74.7,"import_time_s":16.63,"mem_mb":162.7,"disk_size":"5.4G"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"onnxruntime","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"clean","install_time_s":70,"import_time_s":16.75,"mem_mb":162.7,"disk_size":"5.1G"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"optimum-onnx","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":70.2,"import_time_s":null,"mem_mb":null,"disk_size":"5.0G"},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"onnxruntime-gpu","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"onnxruntime","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"optimum-onnx","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"onnxruntime-gpu","exit_code":1,"wheel_type":null,"failure_reason":"timeout","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"onnxruntime","exit_code":1,"wheel_type":null,"failure_reason":"timeout","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"optimum-onnx","exit_code":1,"wheel_type":null,"failure_reason":"timeout","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null}]}}