OnnxSlim
OnnxSlim is an open-source toolkit developed by Microsoft for optimizing ONNX (Open Neural Network Exchange) models. It helps reduce model size and improve inference speed through various techniques like node elimination, constant folding, and shape inference. As of version 0.1.91, it is under active development with frequent updates, aiming to provide a robust solution for model deployment.
Warnings
- gotcha Always verify the slimmed model's output and performance. While OnnxSlim aims for lossless optimization, aggressive slimming or specific model architectures can sometimes subtly alter behavior or reduce compatibility with certain ONNX runtimes or hardware accelerators. Extensive testing after optimization is crucial.
- gotcha Models that store weights as external data files (common for very large models) require careful handling. OnnxSlim primarily operates on the `.onnx` protobuf definition. After slimming, ensure that any associated external data files are correctly moved or regenerated alongside the new slimmed `.onnx` file, maintaining their relative paths if applicable.
- gotcha For models with dynamic input shapes, incorrect shape inference by `onnxslim` can lead to runtime errors. While `onnxslim` includes shape inference, complex dynamic scenarios might require explicit configuration.
Install
-
pip install onnxslim
Imports
- slim
from onnxslim.slim import slim
Quickstart
import onnx
from onnxslim.slim import slim
import os
# Create a minimal dummy ONNX model for demonstration
# In a real-world scenario, you would load your existing model:
# input_model_path = "path/to/your/model.onnx"
# If you don't have one, this creates a simple Add operation model:
from onnx.helper import make_model, make_node, make_graph, make_tensor_value_info
from onnx import TensorProto
input_name = 'input'
output_name = 'output'
input_tensor = make_tensor_value_info(input_name, TensorProto.FLOAT, [1, 2, 3])
output_tensor = make_tensor_value_info(output_name, TensorProto.FLOAT, [1, 2, 3])
node = make_node('Add', [input_name, input_name], [output_name]) # example: output = input + input
graph = make_graph([node], 'simple_add_graph', [input_tensor], [output_tensor])
dummy_model = make_model(graph, opset_imports=[onnx.helper.make_opsetid("", 13)]) # Opset 13 is common
input_model_path = "dummy_model_to_slim.onnx"
output_model_path = "dummy_model_slimmed.onnx"
onnx.save(dummy_model, input_model_path)
print(f"Dummy ONNX model created at {input_model_path}")
try:
# Perform the slimming
slimmed_model_proto = slim(input_model_path)
# The 'slim' function returns an ONNX model proto object.
# To save it, use onnx.save:
onnx.save(slimmed_model_proto, output_model_path)
print(f"Model successfully slimmed and saved to {output_model_path}")
# Optional: Load and verify the slimmed model
# slimmed_model_loaded = onnx.load(output_model_path)
# print(f"Loaded slimmed model with graph name: {slimmed_model_loaded.graph.name}")
except Exception as e:
print(f"An error occurred during slimming: {e}")
finally:
# Clean up dummy files
if os.path.exists(input_model_path):
os.remove(input_model_path)
if os.path.exists(output_model_path):
os.remove(output_model_path)
print("Cleaned up dummy ONNX files.")