Hugging Face Transformers
The central model-definition framework for state-of-the-art ML models across text, vision, audio, video, and multimodal tasks. Provides pretrained model weights, tokenizers, pipelines, and training APIs. Interfaces with PyTorch (primary), with 400+ model architectures and 750k+ checkpoints on the Hub. MAJOR VERSION NOTE: v5 released late 2025 — first major release in 5 years. v5 is PyTorch-only (TensorFlow/Flax/JAX removed). pip install transformers installs v5 as of Feb 2026. v4 was the last stable version before this; v4.57.x is the last v4 release. Requires Python 3.10+ in v5.
Warnings
- breaking v5 drops TensorFlow and Flax/JAX support entirely. TFAutoModel, FlaxAutoModel, and all TF/Flax model classes are removed. pip install transformers now installs v5 — existing code using TF/Flax will break on import.
- breaking load_in_4bit and load_in_8bit as direct kwargs to from_pretrained() are removed in v5. Must use BitsAndBytesConfig.
- breaking AutoFeatureExtractor removed in v5. Use AutoImageProcessor. Fast/slow image processor distinction also eliminated — only 'fast' variants (requires torchvision) remain.
- breaking tokenizer.encode_plus() deprecated in v5. tokenization_utils and tokenization_utils_fast module paths removed and redirected.
- breaking TRANSFORMERS_CACHE environment variable removed in v5. Cache location now controlled by HF_HOME.
- breaking Python 3.10+ required in v5. Python 3.9 and below are not supported.
- gotcha pip install transformers (no extras) installs the package but does NOT install PyTorch. Importing any model then raises 'No module named torch'. This is a constant source of confusion for new users.
- gotcha Models are downloaded to ~/.cache/huggingface/hub on first from_pretrained() call. Large models (7B+) can be tens of GB. In CI or containers with limited disk, this causes silent failures or disk full errors.
- gotcha device_map='auto' requires accelerate to be installed. Without it, from_pretrained(..., device_map='auto') raises ImportError. Not installed by default with transformers[torch].
Install
-
pip install transformers[torch] -
pip install transformers -
pip install transformers[vision] -
pip install transformers[audio] -
pip install 'transformers<5'
Imports
- pipeline
from transformers import pipeline
- AutoModelForCausalLM
from transformers import AutoModelForCausalLM, AutoTokenizer
- AutoImageProcessor (v5)
from transformers import AutoImageProcessor
Quickstart
from transformers import pipeline
# Text generation
generator = pipeline("text-generation", model="Qwen/Qwen2.5-0.5B-Instruct")
result = generator("The future of AI is", max_new_tokens=50)
print(result[0]['generated_text'])
# Embeddings / feature extraction
from transformers import AutoTokenizer, AutoModel
import torch
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")
inputs = tokenizer("Hello, world!", return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
embeddings = outputs.last_hidden_state[:, 0, :] # [CLS] token
print(embeddings.shape) # torch.Size([1, 768])
# Quantized inference (v5 pattern)
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
quant_config = BitsAndBytesConfig(load_in_4bit=True)
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.2-3B",
quantization_config=quant_config,
device_map="auto"
)