ONNX Runtime Extensions

0.15.2 · active · verified Thu Apr 16

ONNX Runtime Extensions is a C/C++ library that extends the capabilities of ONNX models and inference with ONNX Runtime via Custom Operator ABIs. It provides a set of custom operators to support common pre- and post-processing tasks for vision, text, and audio models. The library supports multiple languages and platforms, including Python, Java, C#, and mobile platforms, and is currently at version 0.15.2, with a continuous release cadence.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates the core functionality of onnxruntime-extensions: converting a Hugging Face tokenizer into an ONNX graph with custom operators, and preparing an ONNX Runtime session to use these extensions. It showcases how to set up `SessionOptions` to register the custom operations library and then use `gen_processing_models` to create an ONNX representation of a tokenizer. The resulting ONNX tokenizer model can then be used for pre-processing text data.

import onnxruntime as ort
from onnxruntime_extensions import get_library_path, gen_processing_models, OrtPyFunction
from transformers import AutoTokenizer # pip install transformers
import numpy as np

# 1. Register the custom operators library
so = ort.SessionOptions()
so.register_custom_ops_library(get_library_path())

# 2. Convert a Hugging Face tokenizer to an ONNX processing model
try:
    hf_tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
    # gen_processing_models returns two models: pre-processing (index 0) and post-processing (index 1 if available)
    tokenizer_onnx_model = OrtPyFunction(gen_processing_models(hf_tokenizer, pre_kwargs={})[0])

    # 3. Prepare input and run inference with the ONNX tokenizer model
    input_text = ["Hello, ONNX Runtime Extensions!"]
    # The output from the tokenizer_onnx_model will be the tokenized IDs
    input_ids = tokenizer_onnx_model(input_text)
    
    print(f"Original Text: {input_text}")
    print(f"Token IDs (first input): {input_ids}")

    # Example: Running a simple ONNX model with the custom ops library
    # This part assumes you have an ONNX model (e.g., 'model.onnx')
    # For a full example, you'd typically load your ML model here
    # and connect its inputs/outputs with the tokenizer's outputs.
    # For demonstration, we'll just show a dummy inference session.

    # Dummy ONNX model (replace with your actual model path)
    # Create a dummy ONNX model for demonstration if not available
    # Example: import onnx; import onnx.helper; import onnx.numpy_helper
    # graph_nodes = [onnx.helper.make_node('Identity', ['input'], ['output'])]
    # graph_inputs = [onnx.helper.make_tensor_value_info('input', onnx.TensorProto.INT64, [1, 10])]
    # graph_outputs = [onnx.helper.make_tensor_value_info('output', onnx.TensorProto.INT64, [1, 10])]
    # graph = onnx.helper.make_graph(graph_nodes, 'dummy_graph', graph_inputs, graph_outputs)
    # dummy_model = onnx.helper.make_model(graph, producer_name='dummy_model')
    # onnx.save(dummy_model, 'dummy_model.onnx')

    # Simulate using a real ONNX model
    # Create a minimal ONNX model for demonstration. In a real scenario, this would be a pre-trained model.
    # For this example, we'll just use the tokenized IDs as a dummy input.
    
    # If you have a .onnx model, you would do:
    # sess = ort.InferenceSession("your_model.onnx", so)
    # model_outputs = sess.run(None, {"model_input_name": input_ids})
    # print(f"Model outputs: {model_outputs}")

    print("Quickstart demonstrated converting a Hugging Face tokenizer to ONNX custom operators.")
except ImportError:
    print("Please install 'transformers' for the full quickstart example: pip install transformers")
except Exception as e:
    print(f"An error occurred: {e}")

view raw JSON →