Massive Text Embedding Benchmark (MTEB)
MTEB (Massive Text Embedding Benchmark) is a Python framework for evaluating embeddings and retrieval systems across diverse NLP tasks, including classification, clustering, retrieval, reranking, and semantic textual similarity. It supports over 1000 languages and various modalities like text and image, with continuous expansion. As of version 2.12.16, it aims to provide a standardized, comprehensive, and reproducible way to compare embedding models. The library maintains a frequent release cadence with minor updates often occurring weekly.
Warnings
- breaking MTEB v2 introduced a large-scale refactor with breaking changes, particularly affecting direct usage of `mteb.MTEB` class and `mteb.load_results` functions. Past minor/patch releases also occasionally introduced breaking changes.
- gotcha Evaluating high-performing or large multilingual models on MTEB can be computationally very expensive, requiring significant GPU resources and time, especially for tasks with large document collections like retrieval.
- gotcha Models excelling on the general MTEB leaderboard might underperform on domain-specific data. The benchmark datasets may not perfectly reflect unique domain, user behavior, or query patterns.
- deprecated Directly submitting model results to the MTEB leaderboard by adding metadata to Hugging Face model cards is no longer supported.
- gotcha When evaluating existing models, it is recommended to use `mteb.get_model("{model_name}")` instead of directly using `SentenceTransformer("{model_name}")`. This ensures consistent and reproducible results as it loads the model as MTEB implemented it, accounting for specific normalizations, quantizations, or prompts.
Install
-
pip install mteb -
uv add mteb
Imports
- MTEB
from mteb import MTEB
- evaluate
import mteb results = mteb.evaluate(model, tasks=tasks)
- get_model
import mteb model = mteb.get_model('sentence-transformers/all-MiniLM-L6-v2') - get_tasks
import mteb tasks = mteb.get_tasks(tasks=['Banking77Classification.v2'])
- SentenceTransformer
from sentence_transformers import SentenceTransformer
Quickstart
import mteb
from sentence_transformers import SentenceTransformer
# Select a model to evaluate
model_name = "sentence-transformers/all-MiniLM-L6-v2"
# It's recommended to use mteb.get_model for reproducibility if the model is in MTEB's registry
# Otherwise, SentenceTransformer can be used directly
model = mteb.get_model(model_name) # Will fall back to SentenceTransformer if not registered in MTEB
# Select tasks to run (e.g., a specific classification task)
tasks = mteb.get_tasks(tasks=["Banking77Classification.v2"], languages=["eng"])
# Evaluate the model on the selected tasks
print(f"Running evaluation for {model_name} on {len(tasks)} tasks...")
results = mteb.evaluate(model, tasks=tasks)
print("Evaluation complete. Results:")
for task_name, task_results in results.items():
print(f"Task: {task_name}")
print(f" Main score: {task_results['main_score']:.4f}")
# Example of accessing detailed metrics
if 'accuracy' in task_results['mteb_results']:
print(f" Accuracy: {task_results['mteb_results']['accuracy']:.4f}")
# To save results to a specific folder
# output_folder = f"./results/{model_name.replace('/', '_')}"
# results = mteb.evaluate(model, tasks=tasks, output_folder=output_folder)
# print(f"Results saved to: {output_folder}")