FastEmbed Library
FastEmbed is a fast, light, and accurate Python library for generating retrieval embeddings, designed for efficiency with ONNX Runtime. It supports a variety of models including dense text embeddings, sparse embeddings, and rerankers. The current version is 0.8.0, and it maintains an active release cadence with frequent updates.
Warnings
- breaking Python 3.9 is no longer supported starting from v0.8.0. Previously, Python 3.8 was dropped in v0.5.0. Ensure your environment uses Python 3.10 or newer.
- gotcha FastEmbed v0.8.0+ automatically utilizes CUDA if a compatible GPU is detected and `onnxruntime-gpu` is installed. Explicitly setting `cuda=True` is no longer required and may not be honored if the environment is not set up correctly.
- gotcha Specific versions of `onnxruntime` and `pillow` were fixed in v0.8.0, especially for Python 3.14 compatibility and security. Users on Python 3.14 or those with older transitive dependencies might encounter installation or runtime issues.
- gotcha While `local_files_only=True` prevents downloads, earlier versions (before v0.7.4) might have still made network calls if the model wasn't cached. As of v0.8.0, the `HF_HUB_OFFLINE` environment variable is also respected, providing a more robust offline experience.
Install
-
pip install fastembed -
pip install fastembed[gpu]
Imports
- TextEmbedding
from fastembed import TextEmbedding
- SparseEmbedding
from fastembed import SparseEmbedding
- Reranker
from fastembed import Reranker
- LateInteractionTextEmbedding
from fastembed.late_interaction.late_interaction_text_embedding import LateInteractionTextEmbedding
Quickstart
from fastembed import TextEmbedding
# Initialize the embedding model. Model will be downloaded if not cached.
# Pass specific_model_path for local models, or use local_files_only=True
model = TextEmbedding(model_name="BAAI/bge-small-en-v1.5")
documents = [
"This is a document about the weather in London. It's quite rainy.",
"The quick brown fox jumps over the lazy dog.",
"Python is a high-level, interpreted programming language."
]
# Embed the documents
embeddings = model.embed(documents)
print(f"Generated {len(embeddings)} embeddings.")
print(f"First embedding shape: {embeddings[0].shape}")
print(f"First embedding (first 5 values): {embeddings[0][:5]}")