Sentence Transformers
Framework for computing dense sentence/text/image embeddings using transformer models. Primary use cases: semantic search, semantic similarity, clustering, and reranking. Wraps transformers and provides SentenceTransformer (embedding), CrossEncoder (reranker), and SparseEncoder (sparse embedding) classes. 15,000+ pretrained models on HF Hub. Now officially maintained by Hugging Face (Tom Aarsen) after transfer from UKP Lab/TU Darmstadt. Package name: sentence-transformers (hyphen). Import name: sentence_transformers (underscore).
Warnings
- breaking Python 3.10+ required as of sentence-transformers 5.0. Python 3.9 and below will fail to install.
- breaking sentence-transformers 5.2.2+ dropped the requests dependency in favor of optional httpx, aligning with transformers v5. Code that relied on requests being transitively installed via sentence-transformers may see ImportError on requests.
- breaking Training with sentence-transformers 5.x requires pinning to a compatible transformers version. sentence-transformers 5.2.3 introduced a compatibility fix for transformers v5.2 Trainer changes. Older sentence-transformers 5.x with transformers v5.2 causes training failures at the logging step.
- gotcha encode() returns numpy float32 arrays by default, not torch tensors. Passing embeddings directly to PyTorch operations without converting first causes TypeError. Many tutorials omit this.
- gotcha CrossEncoder and SentenceTransformer are architecturally different and not interchangeable. CrossEncoder scores (query, doc) pairs — it cannot encode a corpus of documents independently. Using CrossEncoder where SentenceTransformer is needed produces wrong results with no error.
- gotcha util.cos_sim() returns values in [-1, 1]. It does NOT return [0, 1]. Thresholding at 0.5 as a "similarity cutoff" is a common mistake — the actual meaningful threshold depends on the model and task.
- gotcha Package name is sentence-transformers (hyphen) but import name is sentence_transformers (underscore). import sentence-transformers raises SyntaxError. from sentence-transformers import ... also fails.
Install
-
pip install sentence-transformers -
pip install sentence-transformers[train] -
pip install sentence-transformers[onnx] -
pip install sentence-transformers[onnx-gpu] -
pip install sentence-transformers[openvino]
Imports
- SentenceTransformer
from sentence_transformers import SentenceTransformer
- CrossEncoder
from sentence_transformers import CrossEncoder
- util.cos_sim
from sentence_transformers import util
Quickstart
from sentence_transformers import SentenceTransformer, util
import numpy as np
# Load model (downloads on first use, ~90MB for MiniLM)
model = SentenceTransformer("all-MiniLM-L6-v2")
# Encode sentences → numpy float32 arrays by default
sentences = [
"The cat sat on the mat.",
"A feline rested on a rug.",
"The stock market crashed today."
]
embeddings = model.encode(sentences) # shape: (3, 384)
print(embeddings.shape)
# Cosine similarity
cosine_scores = util.cos_sim(embeddings[0], embeddings[1:])
print(cosine_scores) # similar pair scores higher
# Return torch tensors instead of numpy
embeddings_tensor = model.encode(sentences, convert_to_tensor=True)
# Semantic search
query_embedding = model.encode("Where did the cat sleep?", convert_to_tensor=True)
hits = util.semantic_search(query_embedding, embeddings_tensor, top_k=2)
print(hits)