{"id":4124,"library":"mteb","title":"Massive Text Embedding Benchmark (MTEB)","description":"MTEB (Massive Text Embedding Benchmark) is a Python framework for evaluating embeddings and retrieval systems across diverse NLP tasks, including classification, clustering, retrieval, reranking, and semantic textual similarity. It supports over 1000 languages and various modalities like text and image, with continuous expansion. As of version 2.12.16, it aims to provide a standardized, comprehensive, and reproducible way to compare embedding models. The library maintains a frequent release cadence with minor updates often occurring weekly.","status":"active","version":"2.12.16","language":"en","source_language":"en","source_url":"https://github.com/embeddings-benchmark/mteb","tags":["embeddings","benchmarking","nlp","evaluation","sentence-transformers","huggingface","multimodal","retrieval","classification","clustering","semantic-textual-similarity"],"install":[{"cmd":"pip install mteb","lang":"bash","label":"Install core library"},{"cmd":"uv add mteb","lang":"bash","label":"Faster installation with uv"}],"dependencies":[{"reason":"Runtime environment","package":"python","version":">=3.10, <3.15","optional":false},{"reason":"Commonly used for loading and evaluating many pre-trained models. MTEB also offers its own model loading mechanism.","package":"sentence-transformers","optional":false},{"reason":"Underlying deep learning framework, implicitly required by sentence-transformers and many models.","package":"torch","optional":false},{"reason":"Underlying library for many models and tokenizers.","package":"transformers","optional":false}],"imports":[{"note":"The MTEB class was part of a major refactor in v2; direct use is less common than 'mteb.evaluate' or 'mteb.get_model' now.","wrong":"from mteb.MTEB import MTEB","symbol":"MTEB","correct":"from mteb import MTEB"},{"note":"The recommended way to run evaluations.","symbol":"evaluate","correct":"import mteb\nresults = mteb.evaluate(model, tasks=tasks)"},{"note":"Recommended for loading existing models as implemented in MTEB for reproducibility.","symbol":"get_model","correct":"import mteb\nmodel = mteb.get_model('sentence-transformers/all-MiniLM-L6-v2')"},{"note":"Used to select specific benchmark tasks.","symbol":"get_tasks","correct":"import mteb\ntasks = mteb.get_tasks(tasks=['Banking77Classification.v2'])"},{"note":"Used to load models that are not yet directly implemented in MTEB's registry.","symbol":"SentenceTransformer","correct":"from sentence_transformers import SentenceTransformer"}],"quickstart":{"code":"import mteb\nfrom sentence_transformers import SentenceTransformer\n\n# Select a model to evaluate\nmodel_name = \"sentence-transformers/all-MiniLM-L6-v2\"\n# It's recommended to use mteb.get_model for reproducibility if the model is in MTEB's registry\n# Otherwise, SentenceTransformer can be used directly\nmodel = mteb.get_model(model_name) # Will fall back to SentenceTransformer if not registered in MTEB\n\n# Select tasks to run (e.g., a specific classification task)\ntasks = mteb.get_tasks(tasks=[\"Banking77Classification.v2\"], languages=[\"eng\"])\n\n# Evaluate the model on the selected tasks\nprint(f\"Running evaluation for {model_name} on {len(tasks)} tasks...\")\nresults = mteb.evaluate(model, tasks=tasks)\n\nprint(\"Evaluation complete. Results:\")\nfor task_name, task_results in results.items():\n    print(f\"Task: {task_name}\")\n    print(f\"  Main score: {task_results['main_score']:.4f}\")\n    # Example of accessing detailed metrics\n    if 'accuracy' in task_results['mteb_results']:\n        print(f\"  Accuracy: {task_results['mteb_results']['accuracy']:.4f}\")\n\n# To save results to a specific folder\n# output_folder = f\"./results/{model_name.replace('/', '_')}\"\n# results = mteb.evaluate(model, tasks=tasks, output_folder=output_folder)\n# print(f\"Results saved to: {output_folder}\")","lang":"python","description":"This quickstart demonstrates how to load a pre-trained Sentence Transformer model and evaluate it on a specific MTEB task using the `mteb.evaluate` function. It showcases how to select tasks and retrieve the evaluation results."},"warnings":[{"fix":"Refer to the official MTEB documentation for the updated API, especially focusing on `mteb.evaluate`, `mteb.get_model`, and `mteb.get_tasks`. Ensure your code aligns with the new functional interface rather than directly instantiating the `MTEB` class.","message":"MTEB v2 introduced a large-scale refactor with breaking changes, particularly affecting direct usage of `mteb.MTEB` class and `mteb.load_results` functions. Past minor/patch releases also occasionally introduced breaking changes.","severity":"breaking","affected_versions":"All versions prior to 2.x when upgrading to 2.x; potentially minor versions before 2.x."},{"fix":"Start with smaller task subsets or mini-benchmarks to estimate resource usage. Consider using optimized models or distributed evaluation setups. MTEB also offers caching mechanisms to speed up repeated evaluations.","message":"Evaluating high-performing or large multilingual models on MTEB can be computationally very expensive, requiring significant GPU resources and time, especially for tasks with large document collections like retrieval.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always perform additional evaluations on your specific domain data to validate model suitability. MTEB can be extended with custom tasks to facilitate this.","message":"Models excelling on the general MTEB leaderboard might underperform on domain-specific data. The benchmark datasets may not perfectly reflect unique domain, user behavior, or query patterns.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Follow the updated submission guidelines on the MTEB GitHub repository or documentation to ensure results are correctly associated with the model implementation.","message":"Directly submitting model results to the MTEB leaderboard by adding metadata to Hugging Face model cards is no longer supported.","severity":"deprecated","affected_versions":"Post-v1.x (Exact version unclear, but mentioned after v2 refactor)"},{"fix":"Replace `model = SentenceTransformer(model_name)` with `model = mteb.get_model(model_name)`. MTEB's function will fall back to `SentenceTransformer` if the model isn't specifically registered.","message":"When evaluating existing models, it is recommended to use `mteb.get_model(\"{model_name}\")` instead of directly using `SentenceTransformer(\"{model_name}\")`. This ensures consistent and reproducible results as it loads the model as MTEB implemented it, accounting for specific normalizations, quantizations, or prompts.","severity":"gotcha","affected_versions":"All versions, particularly relevant for models already on the MTEB leaderboard."}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}