Hugging Face Optimum
Optimum is an extension of the Hugging Face Transformers library, providing a framework to integrate third-party libraries from Hardware Partners and interface with their specific functionality. It focuses on optimizing models for various accelerators and runtimes, enabling faster training and inference for Transformer-based models. The library is actively developed, with its current version being 2.1.0, and receives frequent updates and new features.
Warnings
- breaking Breaking Change in v2.0.0: ONNX integration (export and ONNX Runtime inference) was moved to a separate package, `optimum-onnx`. Users upgrading from v1.x must install `optimum-onnx` (e.g., `pip install "optimum-onnx[onnxruntime]"`) to retain ONNX functionality.
- deprecated `AutoGPTQ` functionality has been fully deprecated in v2.1.0 in favor of `GPTQModel`. Users should migrate to `GPTQModel` for quantization-aware training and inference with GPTQ.
- deprecated In v2.0.0, support for TF Lite, BetterTransformer, and ONNX Runtime Training was deprecated and subsequently removed or moved to other specialized packages. TensorFlow model export was also removed.
- gotcha The `export=True` argument in `ORTModelForCausalLM.from_pretrained` (and similar `ORTModel` classes) became optional and often inferred from v1.25.0 onwards. Using it explicitly might not be necessary or could lead to unexpected behavior if not handled correctly in newer versions.
- gotcha Compatibility issues can arise between specific `optimum` versions and newer `transformers` versions, particularly regarding internal module imports like `TF2_WEIGHTS_NAME` (fixed in v2.1.0). Ensure your `optimum` and `transformers` installations are compatible.
Install
-
pip install optimum -
pip install "optimum[onnxruntime]" -
pip install optimum-onnx
Imports
- ORTModelForCausalLM
from optimum.onnxruntime import ORTModelForCausalLM
- ORTModelForSequenceClassification
from optimum.onnxruntime import ORTModelForSequenceClassification
- pipeline
from optimum.onnxruntime import pipeline
Quickstart
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForCausalLM, pipeline
import os
# Load an already optimized ONNX Runtime model from the Hugging Face Hub
model_id = "optimum/gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = ORTModelForCausalLM.from_pretrained(model_id)
# Create a pipeline using the ONNX Runtime optimized model
text_generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
# Generate text
prompt = "My name is Philipp"
result = text_generator(prompt, max_new_tokens=10)
print(result)