{"id":8373,"library":"onnxruntime-extensions","title":"ONNX Runtime Extensions","description":"ONNX Runtime Extensions is a C/C++ library that extends the capabilities of ONNX models and inference with ONNX Runtime via Custom Operator ABIs. It provides a set of custom operators to support common pre- and post-processing tasks for vision, text, and audio models. The library supports multiple languages and platforms, including Python, Java, C#, and mobile platforms, and is currently at version 0.15.2, with a continuous release cadence.","status":"active","version":"0.15.2","language":"en","source_language":"en","source_url":"https://github.com/microsoft/onnxruntime-extensions","tags":["ONNX","Machine Learning","MLOps","Pre-processing","Post-processing","Custom Operators","NLP","Vision","Audio","Hugging Face","Deep Learning"],"install":[{"cmd":"pip install onnxruntime-extensions","lang":"bash","label":"Stable Release"},{"cmd":"pip install --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ onnxruntime-extensions","lang":"bash","label":"Nightly Build (Windows)"},{"cmd":"python -m pip install git+https://github.com/microsoft/onnxruntime-extensions.git","lang":"bash","label":"Install from Source (Linux/macOS)"}],"dependencies":[{"reason":"Required for ONNX model inference and custom operator registration.","package":"onnxruntime"},{"reason":"Required for generating and manipulating ONNX graphs, especially with `gen_processing_models`.","package":"onnx"},{"reason":"Needed for converting Hugging Face tokenizers into ONNX custom operators using `gen_processing_models`.","package":"transformers","optional":true},{"reason":"Commonly used for array manipulation with ONNX Runtime inputs/outputs.","package":"numpy","optional":true}],"imports":[{"note":"Used to register the custom operators library with ONNX Runtime sessions.","symbol":"get_library_path","correct":"from onnxruntime_extensions import get_library_path"},{"note":"Primary API for converting Hugging Face data processing classes (like tokenizers) into ONNX processing graphs.","symbol":"gen_processing_models","correct":"from onnxruntime_extensions import gen_processing_models"},{"note":"Used to wrap an ONNX model, making it callable like a Python function for inference.","symbol":"OrtPyFunction","correct":"from onnxruntime_extensions import OrtPyFunction"},{"note":"`PyOrtFunction` is the correct name for wrapping models or custom ops for inference from file or definition, as of recent versions. `OrtFunction` might have been used in older examples or internal contexts.","wrong":"from onnxruntime_extensions import OrtFunction","symbol":"PyOrtFunction","correct":"from onnxruntime_extensions import PyOrtFunction"},{"note":"Decorator for defining custom operators using Python functions.","symbol":"onnx_op","correct":"from onnxruntime_extensions import onnx_op"}],"quickstart":{"code":"import onnxruntime as ort\nfrom onnxruntime_extensions import get_library_path, gen_processing_models, OrtPyFunction\nfrom transformers import AutoTokenizer # pip install transformers\nimport numpy as np\n\n# 1. Register the custom operators library\nso = ort.SessionOptions()\nso.register_custom_ops_library(get_library_path())\n\n# 2. Convert a Hugging Face tokenizer to an ONNX processing model\ntry:\n    hf_tokenizer = AutoTokenizer.from_pretrained(\"bert-base-uncased\")\n    # gen_processing_models returns two models: pre-processing (index 0) and post-processing (index 1 if available)\n    tokenizer_onnx_model = OrtPyFunction(gen_processing_models(hf_tokenizer, pre_kwargs={})[0])\n\n    # 3. Prepare input and run inference with the ONNX tokenizer model\n    input_text = [\"Hello, ONNX Runtime Extensions!\"]\n    # The output from the tokenizer_onnx_model will be the tokenized IDs\n    input_ids = tokenizer_onnx_model(input_text)\n    \n    print(f\"Original Text: {input_text}\")\n    print(f\"Token IDs (first input): {input_ids}\")\n\n    # Example: Running a simple ONNX model with the custom ops library\n    # This part assumes you have an ONNX model (e.g., 'model.onnx')\n    # For a full example, you'd typically load your ML model here\n    # and connect its inputs/outputs with the tokenizer's outputs.\n    # For demonstration, we'll just show a dummy inference session.\n\n    # Dummy ONNX model (replace with your actual model path)\n    # Create a dummy ONNX model for demonstration if not available\n    # Example: import onnx; import onnx.helper; import onnx.numpy_helper\n    # graph_nodes = [onnx.helper.make_node('Identity', ['input'], ['output'])]\n    # graph_inputs = [onnx.helper.make_tensor_value_info('input', onnx.TensorProto.INT64, [1, 10])]\n    # graph_outputs = [onnx.helper.make_tensor_value_info('output', onnx.TensorProto.INT64, [1, 10])]\n    # graph = onnx.helper.make_graph(graph_nodes, 'dummy_graph', graph_inputs, graph_outputs)\n    # dummy_model = onnx.helper.make_model(graph, producer_name='dummy_model')\n    # onnx.save(dummy_model, 'dummy_model.onnx')\n\n    # Simulate using a real ONNX model\n    # Create a minimal ONNX model for demonstration. In a real scenario, this would be a pre-trained model.\n    # For this example, we'll just use the tokenized IDs as a dummy input.\n    \n    # If you have a .onnx model, you would do:\n    # sess = ort.InferenceSession(\"your_model.onnx\", so)\n    # model_outputs = sess.run(None, {\"model_input_name\": input_ids})\n    # print(f\"Model outputs: {model_outputs}\")\n\n    print(\"Quickstart demonstrated converting a Hugging Face tokenizer to ONNX custom operators.\")\nexcept ImportError:\n    print(\"Please install 'transformers' for the full quickstart example: pip install transformers\")\nexcept Exception as e:\n    print(f\"An error occurred: {e}\")","lang":"python","description":"This quickstart demonstrates the core functionality of onnxruntime-extensions: converting a Hugging Face tokenizer into an ONNX graph with custom operators, and preparing an ONNX Runtime session to use these extensions. It showcases how to set up `SessionOptions` to register the custom operations library and then use `gen_processing_models` to create an ONNX representation of a tokenizer. The resulting ONNX tokenizer model can then be used for pre-processing text data."},"warnings":[{"fix":"Ensure your models or post-processing logic can handle `int64` token ID inputs. Inspect the output types of your generated ONNX processing models.","message":"The `gen_processing_models` API was modified in v0.10.0 to unify tokenizer output data types to `int64`. This might require adjustments if your downstream models or processing steps expected `int32` outputs from tokenizers.","severity":"breaking","affected_versions":">=0.10.0"},{"fix":"Upgrade your `transformers` library to version 4.45 or higher if encountering issues with tokenizer conversion. If compatibility with older `transformers` is critical, consider using `onnxruntime-extensions <0.13.0`.","message":"Version 0.13.0 introduced support for the latest Hugging Face tokenization JSON format (`transformers>=4.45`). Older `transformers` versions might produce tokenizer JSONs that are incompatible with newer `onnxruntime-extensions` for conversion.","severity":"breaking","affected_versions":">=0.13.0"},{"fix":"Explicitly install `onnx`: `pip install onnx` alongside `onnxruntime-extensions`.","message":"When using the `onnxruntime_extensions` Python package for model processing (e.g., with `gen_processing_models`), the `onnx` package is a required peer dependency. Without it, graph manipulation functionalities may fail.","severity":"gotcha","affected_versions":"All versions"},{"fix":"For C/C++ integrations, be prepared for potential API adjustments with new releases. Follow GitHub releases closely for C API specific changes. For Python, this is generally abstracted away, but underlying C API changes can still affect behavior.","message":"The C APIs provided by `onnxruntime-extensions` are considered experimental and are subject to change between releases, which may impact applications linking directly against the native library.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Try installing from source. Ensure you have a compatible C++ compiler toolchain (e.g., `gcc` >= 8.0 or `clang` for Linux/macOS) and then run: `python -m pip install git+https://github.com/microsoft/onnxruntime-extensions.git`.","cause":"This usually occurs on less common architectures (e.g., ARM-based processors) or specific Python versions for which pre-built wheels are not available on PyPI.","error":"error: no matching distribution found for onnxruntime-extensions"},{"fix":"Inspect the input requirements of your ONNX model or custom operator. Use `model.graph.input` (for ONNX models) or documentation for custom ops to determine the expected input shape and type. Reshape your NumPy array inputs using `np.reshape()` or `np.expand_dims()` to match.","cause":"The input tensor provided to an ONNX model or custom operator has an incorrect number of dimensions (rank) or incompatible shape compared to what the ONNX graph expects.","error":"[ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Invalid rank for input: ... Got: X Expected: Y"},{"fix":"For CUDA, ensure `CUDA_PATH` environment variable is correctly set to your CUDA toolkit installation. For general DLL issues, try reinstalling `onnxruntime` and `onnxruntime-extensions` in a clean environment, and ensure your system's Visual C++ Redistributables are up-to-date. If using Conda, try `conda install -c conda-forge onnxruntime` before `pip install onnxruntime-extensions`.","cause":"This error on Windows typically indicates missing or incompatible dependencies for the underlying native library. For CUDA-enabled builds, `CUDA_PATH` might be unset or incorrect; for Conda, environment issues.","error":"ImportError: DLL load failed while importing onnxruntime_extensions: A dynamic link library (DLL) initialization routine failed."},{"fix":"Explicitly cast your NumPy array inputs to the correct data type using `input_array.astype(np.float32)` or the expected type. ONNX Runtime typically expects `float32` (single precision) for floats.","cause":"The data type of the input tensor (e.g., `np.float64`) does not match the expected data type of the ONNX model or custom operator (e.g., `float32`).","error":"[ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type."}]}