IPEX-LLM

2.2.0 · active · verified Mon Apr 13

IPEX-LLM is a PyTorch-based library developed by Intel for optimizing Large Language Models (LLMs) on Intel CPUs and GPUs (XPUs). It provides tools for efficient inference and fine-tuning, leveraging Intel hardware accelerations. The current stable version is 2.2.0, with frequent nightly builds and updates, often released in conjunction with the broader BigDL project.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to load an LLM using the `ipex_llm.LLM` class for simple inference or `ipex_llm.transformers.AutoModel` and `AutoTokenizer` for more fine-grained control and compatibility with Hugging Face Transformers. Ensure your `model_name` or `model_id` points to a valid local path or Hugging Face model.

from ipex_llm import LLM

# Instantiate LLM model
model = LLM(
    model_name='/path/to/your/model',
    optimize_type='int4',
    dtype='auto',
    trust_remote_code=True
)

# Example for text generation
prompt = "What is the capital of France?"
output = model(prompt)
print(output)

# For AutoModel/AutoTokenizer
from ipex_llm.transformers import AutoModel, AutoTokenizer

model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModel.from_pretrained(
    model_id,
    load_in_4bit=True, # or load_in_low_bit, quantize=4 etc.
    torch_dtype='auto'
)

input_ids = tokenizer.encode(prompt, return_tensors="pt")
output = model.generate(input_ids, max_new_tokens=32)
print(tokenizer.decode(output[0], skip_special_tokens=True))

view raw JSON →