Hugging Face Transformers

raw JSON →
5.2.0 verified Tue May 12 auth: no python install: stale quickstart: stale

The central model-definition framework for state-of-the-art ML models across text, vision, audio, video, and multimodal tasks. Provides pretrained model weights, tokenizers, pipelines, and training APIs. Interfaces with PyTorch (primary), with 400+ model architectures and 750k+ checkpoints on the Hub. MAJOR VERSION NOTE: v5 released late 2025 — first major release in 5 years. v5 is PyTorch-only (TensorFlow/Flax/JAX removed). pip install transformers installs v5 as of Feb 2026. v4 was the last stable version before this; v4.57.x is the last v4 release. Requires Python 3.10+ in v5.

pip install transformers[torch]
error ModuleNotFoundError: No module named 'transformers.modeling_tf_utils'
cause Transformers v5 removed all TensorFlow and Flax backend support, including the `modeling_tf_utils` module, making it a PyTorch-only library.
fix
Migrate your code to use PyTorch-compatible classes and functions, or downgrade transformers to a v4 version (e.g., pip install transformers==4.x.x) if TensorFlow support is essential.
error ImportError: transformers requires Python 3.10 or higher.
cause Transformers v5 explicitly requires Python version 3.10 or newer, and your current environment is running an older Python version.
fix
Upgrade your Python installation to version 3.10 or higher, or install transformers v4 (e.g., pip install transformers==4.x.x) if you need to use an older Python version.
error ValueError: Loading a model with custom code requires passing `trust_remote_code=True`.
cause You are attempting to load a model or tokenizer from the Hugging Face Hub that includes custom Python code, which `transformers` blocks by default for security reasons.
fix
Add trust_remote_code=True to your from_pretrained() call (e.g., AutoModel.from_pretrained('org/model', trust_remote_code=True)), but only if you understand and trust the source of the custom code.
error OSError: Can't load weights for 'MODEL_NAME'. If you were trying to load it from 'https://huggingface.co/...' or from local files
cause The specified model or its configuration/weights could not be found or accessed on the Hugging Face Hub or locally, possibly due to a typo, network issues, or a private model without proper authentication.
fix
Double-check the model name for typos, ensure you have an an active internet connection, and if it's a private model, ensure you are logged in (huggingface-cli login) or provide a valid authentication token.
breaking v5 drops TensorFlow and Flax/JAX support entirely. TFAutoModel, FlaxAutoModel, and all TF/Flax model classes are removed. pip install transformers now installs v5 — existing code using TF/Flax will break on import.
fix Migrate to PyTorch. If TF/Flax is required, pin: pip install 'transformers<5'. Last v4 release: 4.57.3.
breaking load_in_4bit and load_in_8bit as direct kwargs to from_pretrained() are removed in v5. Must use BitsAndBytesConfig.
fix from transformers import BitsAndBytesConfig; model = AutoModel.from_pretrained(id, quantization_config=BitsAndBytesConfig(load_in_4bit=True))
breaking AutoFeatureExtractor removed in v5. Use AutoImageProcessor. Fast/slow image processor distinction also eliminated — only 'fast' variants (requires torchvision) remain.
fix Replace AutoFeatureExtractor with AutoImageProcessor. Install torchvision if missing.
breaking tokenizer.encode_plus() deprecated in v5. tokenization_utils and tokenization_utils_fast module paths removed and redirected.
fix Replace encode_plus() with direct tokenizer() call: tokenizer(text, truncation=True, padding='max_length', max_length=128, return_tensors='pt')
breaking TRANSFORMERS_CACHE environment variable removed in v5. Cache location now controlled by HF_HOME.
fix Replace: export TRANSFORMERS_CACHE=/path with: export HF_HOME=/path
breaking Python 3.10+ required in v5. Python 3.9 and below are not supported.
fix Upgrade to Python 3.10+. Or pin transformers<5 for Python 3.9.
gotcha pip install transformers (no extras) installs the package but does NOT install PyTorch. Importing any model then raises 'No module named torch'. This is a constant source of confusion for new users.
fix Always install with extras: pip install transformers[torch]. Or install torch separately first.
gotcha Models are downloaded to ~/.cache/huggingface/hub on first from_pretrained() call. Large models (7B+) can be tens of GB. In CI or containers with limited disk, this causes silent failures or disk full errors.
fix Set HF_HOME to a volume with sufficient space. Pre-download models using snapshot_download() or huggingface-cli download.
gotcha device_map='auto' requires accelerate to be installed. Without it, from_pretrained(..., device_map='auto') raises ImportError. Not installed by default with transformers[torch].
fix pip install accelerate alongside transformers.
breaking Building `tokenizers` (a core dependency of `transformers`) from source often fails on musl-based Linux distributions (like Alpine) due to missing C toolchain libraries (e.g., `libgcc_s.so.1`) or Rust compilation issues. This occurs when compatible pre-built wheels for the specific Python version and musl architecture are not available, forcing a source build.
fix Ensure your Alpine environment has `build-base` and `rust` packages installed (e.g., `apk add build-base rust`). For newer Python versions or if issues persist, consider using official Python images based on glibc (e.g., `python:3.x-slim-bullseye` instead of `python:3.x-alpine`) or explicitly pinning `tokenizers` to a version with a compatible musl wheel if available.
pip install transformers
pip install transformers[vision]
pip install transformers[audio]
pip install 'transformers<5'
python os / libc variant status wheel install import disk
3.10 alpine (musl) 'transformers<5' - - 4.50s 244.6M
3.10 alpine (musl) transformers - - 3.83s 243.4M
3.10 alpine (musl) audio - - - -
3.10 alpine (musl) torch - - - -
3.10 alpine (musl) vision - - 3.94s 263.2M
3.10 slim (glibc) 'transformers<5' - - 2.92s 223M
3.10 slim (glibc) transformers - - 2.66s 222M
3.10 slim (glibc) audio - - 2.85s 671M
3.10 slim (glibc) torch - - 14.20s 4.8G
3.10 slim (glibc) vision - - 13.76s 4.8G
3.11 alpine (musl) 'transformers<5' - - 6.71s 274.8M
3.11 alpine (musl) transformers - - 5.82s 272.6M
3.11 alpine (musl) audio - - - -
3.11 alpine (musl) torch - - - -
3.11 alpine (musl) vision - - 6.02s 291.7M
3.11 slim (glibc) 'transformers<5' - - 5.46s 253M
3.11 slim (glibc) transformers - - 4.79s 251M
3.11 slim (glibc) audio - - 5.21s 723M
3.11 slim (glibc) torch - - 18.31s 4.8G
3.11 slim (glibc) vision - - 22.71s 4.9G
3.12 alpine (musl) 'transformers<5' - - 5.95s 259.1M
3.12 alpine (musl) transformers - - 5.40s 257.1M
3.12 alpine (musl) audio - - - -
3.12 alpine (musl) torch - - - -
3.12 alpine (musl) vision - - 5.55s 276.3M
3.12 slim (glibc) 'transformers<5' - - 6.09s 237M
3.12 slim (glibc) transformers - - 5.46s 235M
3.12 slim (glibc) audio - - 6.09s 700M
3.12 slim (glibc) torch - - 17.99s 4.8G
3.12 slim (glibc) vision - - 21.75s 4.9G
3.13 alpine (musl) 'transformers<5' - - 4.95s 258.3M
3.13 alpine (musl) transformers - - 4.70s 256.4M
3.13 alpine (musl) audio - - - -
3.13 alpine (musl) torch - - - -
3.13 alpine (musl) vision - - 4.79s 275.5M
3.13 slim (glibc) 'transformers<5' - - 5.20s 236M
3.13 slim (glibc) transformers - - 4.84s 234M
3.13 slim (glibc) audio - - - -
3.13 slim (glibc) torch - - 16.61s 4.8G
3.13 slim (glibc) vision - - 16.84s 4.9G
3.9 alpine (musl) 'transformers<5' - - 4.25s 251.9M
3.9 alpine (musl) transformers - - 4.27s 251.9M
3.9 alpine (musl) audio - - - -
3.9 alpine (musl) torch - - - -
3.9 alpine (musl) vision - - 4.37s 269.3M
3.9 slim (glibc) 'transformers<5' - - 3.62s 233M
3.9 slim (glibc) transformers - - 3.66s 233M
3.9 slim (glibc) audio - - - -
3.9 slim (glibc) torch - - - -
3.9 slim (glibc) vision - - 4.22s 251M

pipeline() handles everything automatically. For quantization in v5, use BitsAndBytesConfig — passing load_in_4bit=True directly to from_pretrained() is removed.

from transformers import pipeline

# Text generation
generator = pipeline("text-generation", model="Qwen/Qwen2.5-0.5B-Instruct")
result = generator("The future of AI is", max_new_tokens=50)
print(result[0]['generated_text'])

# Embeddings / feature extraction
from transformers import AutoTokenizer, AutoModel
import torch

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")

inputs = tokenizer("Hello, world!", return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
    embeddings = outputs.last_hidden_state[:, 0, :]  # [CLS] token
print(embeddings.shape)  # torch.Size([1, 768])

# Quantized inference (v5 pattern)
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
quant_config = BitsAndBytesConfig(load_in_4bit=True)
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.2-3B",
    quantization_config=quant_config,
    device_map="auto"
)