Massive Text Embedding Benchmark (MTEB)

2.12.16 · active · verified Sat Apr 11

MTEB (Massive Text Embedding Benchmark) is a Python framework for evaluating embeddings and retrieval systems across diverse NLP tasks, including classification, clustering, retrieval, reranking, and semantic textual similarity. It supports over 1000 languages and various modalities like text and image, with continuous expansion. As of version 2.12.16, it aims to provide a standardized, comprehensive, and reproducible way to compare embedding models. The library maintains a frequent release cadence with minor updates often occurring weekly.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to load a pre-trained Sentence Transformer model and evaluate it on a specific MTEB task using the `mteb.evaluate` function. It showcases how to select tasks and retrieve the evaluation results.

import mteb
from sentence_transformers import SentenceTransformer

# Select a model to evaluate
model_name = "sentence-transformers/all-MiniLM-L6-v2"
# It's recommended to use mteb.get_model for reproducibility if the model is in MTEB's registry
# Otherwise, SentenceTransformer can be used directly
model = mteb.get_model(model_name) # Will fall back to SentenceTransformer if not registered in MTEB

# Select tasks to run (e.g., a specific classification task)
tasks = mteb.get_tasks(tasks=["Banking77Classification.v2"], languages=["eng"])

# Evaluate the model on the selected tasks
print(f"Running evaluation for {model_name} on {len(tasks)} tasks...")
results = mteb.evaluate(model, tasks=tasks)

print("Evaluation complete. Results:")
for task_name, task_results in results.items():
    print(f"Task: {task_name}")
    print(f"  Main score: {task_results['main_score']:.4f}")
    # Example of accessing detailed metrics
    if 'accuracy' in task_results['mteb_results']:
        print(f"  Accuracy: {task_results['mteb_results']['accuracy']:.4f}")

# To save results to a specific folder
# output_folder = f"./results/{model_name.replace('/', '_')}"
# results = mteb.evaluate(model, tasks=tasks, output_folder=output_folder)
# print(f"Results saved to: {output_folder}")

view raw JSON →