Hugging Face Transformers

5.2.0 · active · verified Sat Feb 28

The central model-definition framework for state-of-the-art ML models across text, vision, audio, video, and multimodal tasks. Provides pretrained model weights, tokenizers, pipelines, and training APIs. Interfaces with PyTorch (primary), with 400+ model architectures and 750k+ checkpoints on the Hub. MAJOR VERSION NOTE: v5 released late 2025 — first major release in 5 years. v5 is PyTorch-only (TensorFlow/Flax/JAX removed). pip install transformers installs v5 as of Feb 2026. v4 was the last stable version before this; v4.57.x is the last v4 release. Requires Python 3.10+ in v5.

Warnings

Install

Imports

Quickstart

pipeline() handles everything automatically. For quantization in v5, use BitsAndBytesConfig — passing load_in_4bit=True directly to from_pretrained() is removed.

from transformers import pipeline

# Text generation
generator = pipeline("text-generation", model="Qwen/Qwen2.5-0.5B-Instruct")
result = generator("The future of AI is", max_new_tokens=50)
print(result[0]['generated_text'])

# Embeddings / feature extraction
from transformers import AutoTokenizer, AutoModel
import torch

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")

inputs = tokenizer("Hello, world!", return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
    embeddings = outputs.last_hidden_state[:, 0, :]  # [CLS] token
print(embeddings.shape)  # torch.Size([1, 768])

# Quantized inference (v5 pattern)
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
quant_config = BitsAndBytesConfig(load_in_4bit=True)
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.2-3B",
    quantization_config=quant_config,
    device_map="auto"
)

view raw JSON →