{"id":1829,"library":"evaluate","title":"Hugging Face Evaluate","description":"Evaluate is a Hugging Face community-driven open-source library providing a standardized interface for accessing and comparing over 80+ evaluation metrics, datasets, and models. It simplifies the process of evaluating machine learning models, offering a consistent API for various tasks. The library is actively maintained with frequent patch releases, currently at version 0.4.6.","status":"active","version":"0.4.6","language":"en","source_language":"en","source_url":"https://github.com/huggingface/evaluate","tags":["huggingface","nlp","machine-learning","evaluation","metrics","ai","mlops"],"install":[{"cmd":"pip install evaluate","lang":"bash","label":"Install core library"},{"cmd":"pip install evaluate[full]","lang":"bash","label":"Install with common optional dependencies"}],"dependencies":[{"reason":"Required for authentication, loading metrics from the Hugging Face Hub, and interacting with Hub features.","package":"huggingface_hub","optional":false},{"reason":"Required for loading evaluation datasets and often used in conjunction with evaluate for data handling.","package":"datasets","optional":true},{"reason":"Required for certain metrics like 'perplexity' or 'bertscore' which depend on transformer models.","package":"transformers","optional":true},{"reason":"Required for certain NLP-focused metrics like 'sacrebleu' or 'rouge'.","package":"nltk","optional":true}],"imports":[{"note":"Most users start by importing the `evaluate` module and then calling `evaluate.load()`.","wrong":"from evaluate import load # While technically correct, 'import evaluate' is more common for initial metric loading.","symbol":"load","correct":"import evaluate\nmetric = evaluate.load(\"accuracy\")"}],"quickstart":{"code":"import evaluate\n\n# Load an evaluation metric\naccuracy_metric = evaluate.load(\"accuracy\")\n\n# Prepare dummy predictions and references\npredictions = [0, 1, 0, 1, 0]\nreferences = [0, 1, 1, 0, 0]\n\n# Compute the metric\nresults = accuracy_metric.compute(predictions=predictions, references=references)\nprint(f\"Accuracy results: {results}\")\n\n# Load a metric from the Hub that requires a token (e.g., for private models/datasets)\n# TOKEN = os.environ.get('HF_TOKEN', '')\n# if TOKEN:\n#     # Example for a metric that might need a token, like an 'evaluator'\n#     # Note: 'accuracy' itself doesn't require a token for basic use.\n#     # Let's simulate loading an evaluator that might need one.\n#     # For actual evaluators or private resources, `token=TOKEN` would be passed.\n#     # For this example, we'll just demonstrate the token parameter concept.\n#     # evaluator = evaluate.load(\"text_classification\", model_or_pipeline=\"username/my-private-model\", token=TOKEN)\n#     # print(\"Evaluator loaded with token (concept shown).\")\n#     pass # Placeholder, as 'accuracy' doesn't use token for compute\n\n# Example with specific configuration (e.g., for BERTScore)\n# bertscore_metric = evaluate.load(\"bertscore\")\n# predictions_text = [\"The cat sat on the mat.\", \"The dog ate the bone.\"]\n# references_text = [[\"A cat was on the mat.\"], [\"A dog consumed the bone.\"]]\n# bertscore_results = bertscore_metric.compute(predictions=predictions_text, references=references_text, lang=\"en\")\n# print(f\"BERTScore results (first example): {bertscore_results['f1'][0]}\")\n","lang":"python","description":"This quickstart demonstrates how to load a metric (e.g., 'accuracy') using `evaluate.load()` and then compute its results with sample predictions and references. While 'accuracy' doesn't require authentication, the commented section illustrates how an `HF_TOKEN` environment variable would be used with the `token` parameter for metrics or evaluators requiring access to private resources on the Hugging Face Hub. Remember that many metrics require specific input formats (e.g., text for BERTScore) and may need additional `pip install` commands for their dependencies."},"warnings":[{"fix":"Update your code to use `token='your_huggingface_token'` instead of `use_auth_token=True` or `use_auth_token='your_token'` when loading metrics or interacting with the Hub.","message":"The `use_auth_token` parameter has been deprecated across the Hugging Face ecosystem, including `evaluate`. It has been replaced by `token` for authentication.","severity":"breaking","affected_versions":"<0.4.3"},{"fix":"Ensure `huggingface_hub` is updated to version `1.0.0` or higher to maintain compatibility and access the latest authentication mechanisms. Older `huggingface_hub` versions might cause issues with token loading.","message":"As of v0.4.6, `evaluate` removed support for the deprecated `HfFolder` class from `huggingface_hub`. This change adds support for `huggingface_hub>=1.0`.","severity":"breaking","affected_versions":"<0.4.6"},{"fix":"When `evaluate.load()` reports a `ModuleNotFoundError`, check the metric's documentation on the Hugging Face Hub to identify and install the required extra packages (e.g., `pip install transformers` for 'perplexity' or `pip install evaluate[full]` for a broad set of common dependencies).","message":"Many evaluation metrics and evaluators within the `evaluate` library have external dependencies that are not installed by default. These include `nltk`, `scikit-learn`, `transformers`, `datasets`, `jiwer`, etc.","severity":"gotcha","affected_versions":"All versions"},{"fix":"If you encounter unexpected results, try restarting your environment or manually managing the cache (`evaluate.config.set_caching_path()`, `evaluate.config.HF_EVALUATE_OFFLINE`, `evaluate.config.set_verbosity()`). Refer to the official documentation for advanced cache management options if needed.","message":"The `evaluate` library uses caching mechanisms for loaded metrics and sometimes for computation results. While beneficial for performance, this can lead to unexpected behavior if you modify metric parameters or source data without properly clearing or understanding the cache.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}