{"id":5390,"library":"pycocoevalcap","title":"MS-COCO Caption Evaluation","description":"pycocoevalcap provides Python 3 support for evaluating image captions using standard MS-COCO metrics (BLEU, METEOR, ROUGE-L, CIDEr, SPICE). It is derived from the original Python 2.7 coco-caption repository and depends on the COCO API. The latest version is 1.2, released in November 2020, indicating a maintenance-oriented release cadence.","status":"active","version":"1.2","language":"en","source_language":"en","source_url":"https://github.com/salaniz/pycocoevalcap","tags":["coco","image-captioning","evaluation","metrics","nlp"],"install":[{"cmd":"pip install pycocoevalcap","lang":"bash","label":"PyPI"}],"dependencies":[{"reason":"Required for COCO API interaction and data structures.","package":"pycocotools","optional":false},{"reason":"Required for SPICE and PTBTokenizer components. Stanford CoreNLP will be downloaded automatically by SPICE.","package":"Java 1.8.0","optional":false}],"imports":[{"note":"This class orchestrates the evaluation of multiple metrics.","symbol":"COCOEvalCap","correct":"from pycocoevalcap.eval import COCOEvalCap"},{"note":"Import individual metric scorers if you need granular control or specific metrics.","symbol":"Bleu","correct":"from pycocoevalcap.bleu.bleu import Bleu"},{"note":"Import individual metric scorers if you need granular control or specific metrics.","symbol":"Meteor","correct":"from pycocoevalcap.meteor.meteor import Meteor"},{"note":"Import individual metric scorers if you need granular control or specific metrics.","symbol":"Rouge","correct":"from pycocoevalcap.rouge.rouge import Rouge"},{"note":"Import individual metric scorers if you need granular control or specific metrics.","symbol":"Cider","correct":"from pycocoevalcap.cider.cider import Cider"},{"note":"Import individual metric scorers if you need granular control or specific metrics.","symbol":"Spice","correct":"from pycocoevalcap.spice.spice import Spice"}],"quickstart":{"code":"import json\nfrom pycocoevalcap.eval import COCOEvalCap\n\n# Mock ground truth and predicted captions data\n# In a real scenario, these would be loaded from JSON files\n# 'gts' should map image_id to a list of ground truth captions\n# 'res' should map image_id to a list of predicted captions\n\ngts_data = {\n    \"annotations\": [\n        {\"image_id\": 1, \"id\": 101, \"caption\": \"A man is riding a bicycle.\"}, \n        {\"image_id\": 1, \"id\": 102, \"caption\": \"A person on a bike on a street.\"}, \n        {\"image_id\": 2, \"id\": 201, \"caption\": \"Two dogs playing in the grass.\"}, \n        {\"image_id\": 2, \"id\": 202, \"caption\": \"Dogs are running on a lawn.\"}\n    ]\n}\n\nres_data = [\n    {\"image_id\": 1, \"caption\": \"A man cycling on a road.\", \"id\": 301}, \n    {\"image_id\": 2, \"caption\": \"Two puppies in a field.\", \"id\": 302}\n]\n\n# To initialize COCOEvalCap, you need COCO objects for ground truth and results.\n# These COCO objects are typically created from JSON files matching the COCO format.\n# For a quickstart, we'll manually structure the data to match expected input.\n\n# The COCO object expects a dictionary with 'images' and 'annotations' keys.\n# We only need 'annotations' for caption evaluation.\n\n# Mock COCO objects (simplified for quickstart, actual COCO objects handle more fields)\nclass MockCoco:\n    def __init__(self, data):\n        self.anns = {ann['id']: ann for ann in data.get('annotations', [])}\n        self.imgToAnns = {}\n        for ann in data.get('annotations', []):\n            self.imgToAnns.setdefault(ann['image_id'], []).append(ann)\n\n    def loadRes(self, res_json_or_list):\n        # For simplicity, just store results. COCO.loadRes is more complex.\n        res_anns = []\n        for r in res_json_or_list:\n            # Assign a unique ID if not present, similar to COCO API behavior\n            if 'id' not in r:\n                r['id'] = max(self.anns.keys(), default=0) + len(res_anns) + 1\n            res_anns.append(r)\n        res_coco = MockCoco({'annotations': res_anns})\n        return res_coco\n\n    def getImgIds(self):\n        return list(self.imgToAnns.keys())\n\n    def loadAnns(self, ids):\n        return [self.anns[i] for i in ids]\n\n# Initialize Mock COCO objects\n# gts_coco_obj = COCO(gts_json_path)  # In a real application\ngts_coco_obj = MockCoco(gts_data)\n\n# res_coco_obj = gts_coco_obj.loadRes(res_json_path) # In a real application\nres_coco_obj = gts_coco_obj.loadRes(res_data)\n\neval_ids = gts_coco_obj.getImgIds()\n\ncocoEval = COCOEvalCap(gts_coco_obj, res_coco_obj, eval_ids)\n\ncocoEval.evaluate()\n\nprint(\"Evaluation results:\")\nfor metric, score in cocoEval.eval.items():\n    print(f\"{metric}: {score:.3f}\")","lang":"python","description":"This quickstart demonstrates how to set up and run the evaluation using `COCOEvalCap`. It uses mocked ground truth and predicted caption data to illustrate the expected data structure. In a real application, you would load your ground truth annotations into a `pycocotools.coco.COCO` object and your prediction results into another `COCO` object (using `loadRes`). The `evaluate()` method then computes all standard metrics."},"warnings":[{"fix":"Install Java 1.8.0+ and verify it's accessible via your system's PATH. On Windows, adding `_JAVA_OPTIONS -Xmx1024M` to system variables might fix JVM memory errors.","message":"Java 1.8.0 is a mandatory runtime dependency for SPICE and the PTBTokenizer. Ensure Java is installed and properly configured in your environment PATH, or you may encounter `java.lang.UnsatisfiedLinkError` or `CalledProcessError` issues related to Java execution during evaluation.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure you have the necessary C/C++ build tools for your operating system before running `pip install pycocoevalcap`. For Windows, this often means installing 'Desktop development with C++' from Visual Studio Installer.","message":"The `pycocotools` dependency can be challenging to install, especially on Windows or if specific C/C++ build tools are missing. It often requires a C compiler (e.g., build tools for Visual Studio on Windows, or `gcc` on Linux/macOS) to compile its C extensions.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure stable internet connectivity for the initial download. Check write permissions for the `./spice/cache/` directory (or configure `CACHE_DIR` in `./spice/spice.py`). Verify Java is correctly installed and configured.","message":"SPICE (Semantic Propositional Image Caption Evaluation) automatically downloads Stanford CoreNLP models on its first run. This process can fail due to network issues, incorrect permissions for caching, or Java environment problems, leading to 'Could not cache item for SPICE' or similar errors.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Evaluate CIDEr over a corpus of multiple image-caption pairs for meaningful scores. When testing individual pairs, be aware that a score of 0 might not indicate a complete mismatch, but rather a characteristic of the metric's design.","message":"The CIDEr metric, due to its TF-IDF weighting, may return a score of 0 when evaluating only a single ground truth-prediction pair. It's designed for corpus-level evaluation.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Refer to GitHub issues for potential patches or workarounds (e.g., modifications to `meteor.py` as suggested in Issue #19). Ensure input data formats are strictly followed.","message":"Users have reported issues with METEOR score computation, sometimes leading to `subprocess.CalledProcessError` or incorrect score aggregation, possibly due to changes in the underlying Java METEOR implementation or inconsistencies in data processing.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-13T00:00:00.000Z","next_check":"2026-07-12T00:00:00.000Z"}