ONNX Runtime Extensions
ONNX Runtime Extensions is a C/C++ library that extends the capabilities of ONNX models and inference with ONNX Runtime via Custom Operator ABIs. It provides a set of custom operators to support common pre- and post-processing tasks for vision, text, and audio models. The library supports multiple languages and platforms, including Python, Java, C#, and mobile platforms, and is currently at version 0.15.2, with a continuous release cadence.
Common errors
-
error: no matching distribution found for onnxruntime-extensions
cause This usually occurs on less common architectures (e.g., ARM-based processors) or specific Python versions for which pre-built wheels are not available on PyPI.fixTry installing from source. Ensure you have a compatible C++ compiler toolchain (e.g., `gcc` >= 8.0 or `clang` for Linux/macOS) and then run: `python -m pip install git+https://github.com/microsoft/onnxruntime-extensions.git`. -
[ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Invalid rank for input: ... Got: X Expected: Y
cause The input tensor provided to an ONNX model or custom operator has an incorrect number of dimensions (rank) or incompatible shape compared to what the ONNX graph expects.fixInspect the input requirements of your ONNX model or custom operator. Use `model.graph.input` (for ONNX models) or documentation for custom ops to determine the expected input shape and type. Reshape your NumPy array inputs using `np.reshape()` or `np.expand_dims()` to match. -
ImportError: DLL load failed while importing onnxruntime_extensions: A dynamic link library (DLL) initialization routine failed.
cause This error on Windows typically indicates missing or incompatible dependencies for the underlying native library. For CUDA-enabled builds, `CUDA_PATH` might be unset or incorrect; for Conda, environment issues.fixFor CUDA, ensure `CUDA_PATH` environment variable is correctly set to your CUDA toolkit installation. For general DLL issues, try reinstalling `onnxruntime` and `onnxruntime-extensions` in a clean environment, and ensure your system's Visual C++ Redistributables are up-to-date. If using Conda, try `conda install -c conda-forge onnxruntime` before `pip install onnxruntime-extensions`. -
[ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type.
cause The data type of the input tensor (e.g., `np.float64`) does not match the expected data type of the ONNX model or custom operator (e.g., `float32`).fixExplicitly cast your NumPy array inputs to the correct data type using `input_array.astype(np.float32)` or the expected type. ONNX Runtime typically expects `float32` (single precision) for floats.
Warnings
- breaking The `gen_processing_models` API was modified in v0.10.0 to unify tokenizer output data types to `int64`. This might require adjustments if your downstream models or processing steps expected `int32` outputs from tokenizers.
- breaking Version 0.13.0 introduced support for the latest Hugging Face tokenization JSON format (`transformers>=4.45`). Older `transformers` versions might produce tokenizer JSONs that are incompatible with newer `onnxruntime-extensions` for conversion.
- gotcha When using the `onnxruntime_extensions` Python package for model processing (e.g., with `gen_processing_models`), the `onnx` package is a required peer dependency. Without it, graph manipulation functionalities may fail.
- gotcha The C APIs provided by `onnxruntime-extensions` are considered experimental and are subject to change between releases, which may impact applications linking directly against the native library.
Install
-
pip install onnxruntime-extensions -
pip install --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ onnxruntime-extensions -
python -m pip install git+https://github.com/microsoft/onnxruntime-extensions.git
Imports
- get_library_path
from onnxruntime_extensions import get_library_path
- gen_processing_models
from onnxruntime_extensions import gen_processing_models
- OrtPyFunction
from onnxruntime_extensions import OrtPyFunction
- PyOrtFunction
from onnxruntime_extensions import OrtFunction
from onnxruntime_extensions import PyOrtFunction
- onnx_op
from onnxruntime_extensions import onnx_op
Quickstart
import onnxruntime as ort
from onnxruntime_extensions import get_library_path, gen_processing_models, OrtPyFunction
from transformers import AutoTokenizer # pip install transformers
import numpy as np
# 1. Register the custom operators library
so = ort.SessionOptions()
so.register_custom_ops_library(get_library_path())
# 2. Convert a Hugging Face tokenizer to an ONNX processing model
try:
hf_tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
# gen_processing_models returns two models: pre-processing (index 0) and post-processing (index 1 if available)
tokenizer_onnx_model = OrtPyFunction(gen_processing_models(hf_tokenizer, pre_kwargs={})[0])
# 3. Prepare input and run inference with the ONNX tokenizer model
input_text = ["Hello, ONNX Runtime Extensions!"]
# The output from the tokenizer_onnx_model will be the tokenized IDs
input_ids = tokenizer_onnx_model(input_text)
print(f"Original Text: {input_text}")
print(f"Token IDs (first input): {input_ids}")
# Example: Running a simple ONNX model with the custom ops library
# This part assumes you have an ONNX model (e.g., 'model.onnx')
# For a full example, you'd typically load your ML model here
# and connect its inputs/outputs with the tokenizer's outputs.
# For demonstration, we'll just show a dummy inference session.
# Dummy ONNX model (replace with your actual model path)
# Create a dummy ONNX model for demonstration if not available
# Example: import onnx; import onnx.helper; import onnx.numpy_helper
# graph_nodes = [onnx.helper.make_node('Identity', ['input'], ['output'])]
# graph_inputs = [onnx.helper.make_tensor_value_info('input', onnx.TensorProto.INT64, [1, 10])]
# graph_outputs = [onnx.helper.make_tensor_value_info('output', onnx.TensorProto.INT64, [1, 10])]
# graph = onnx.helper.make_graph(graph_nodes, 'dummy_graph', graph_inputs, graph_outputs)
# dummy_model = onnx.helper.make_model(graph, producer_name='dummy_model')
# onnx.save(dummy_model, 'dummy_model.onnx')
# Simulate using a real ONNX model
# Create a minimal ONNX model for demonstration. In a real scenario, this would be a pre-trained model.
# For this example, we'll just use the tokenized IDs as a dummy input.
# If you have a .onnx model, you would do:
# sess = ort.InferenceSession("your_model.onnx", so)
# model_outputs = sess.run(None, {"model_input_name": input_ids})
# print(f"Model outputs: {model_outputs}")
print("Quickstart demonstrated converting a Hugging Face tokenizer to ONNX custom operators.")
except ImportError:
print("Please install 'transformers' for the full quickstart example: pip install transformers")
except Exception as e:
print(f"An error occurred: {e}")