Hugging Face Optimum

2.1.0 · active · verified Sat Apr 11

Optimum is an extension of the Hugging Face Transformers library, providing a framework to integrate third-party libraries from Hardware Partners and interface with their specific functionality. It focuses on optimizing models for various accelerators and runtimes, enabling faster training and inference for Transformer-based models. The library is actively developed, with its current version being 2.1.0, and receives frequent updates and new features.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to load an existing ONNX Runtime optimized model from the Hugging Face Hub and use it with a Hugging Face pipeline for accelerated text generation. It utilizes `ORTModelForCausalLM` and `optimum.onnxruntime.pipeline` for seamless integration and inference.

from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForCausalLM, pipeline
import os

# Load an already optimized ONNX Runtime model from the Hugging Face Hub
model_id = "optimum/gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = ORTModelForCausalLM.from_pretrained(model_id)

# Create a pipeline using the ONNX Runtime optimized model
text_generator = pipeline("text-generation", model=model, tokenizer=tokenizer)

# Generate text
prompt = "My name is Philipp"
result = text_generator(prompt, max_new_tokens=10)
print(result)

view raw JSON →