MS-COCO Caption Evaluation

1.2 · active · verified Mon Apr 13

pycocoevalcap provides Python 3 support for evaluating image captions using standard MS-COCO metrics (BLEU, METEOR, ROUGE-L, CIDEr, SPICE). It is derived from the original Python 2.7 coco-caption repository and depends on the COCO API. The latest version is 1.2, released in November 2020, indicating a maintenance-oriented release cadence.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to set up and run the evaluation using `COCOEvalCap`. It uses mocked ground truth and predicted caption data to illustrate the expected data structure. In a real application, you would load your ground truth annotations into a `pycocotools.coco.COCO` object and your prediction results into another `COCO` object (using `loadRes`). The `evaluate()` method then computes all standard metrics.

import json
from pycocoevalcap.eval import COCOEvalCap

# Mock ground truth and predicted captions data
# In a real scenario, these would be loaded from JSON files
# 'gts' should map image_id to a list of ground truth captions
# 'res' should map image_id to a list of predicted captions

gts_data = {
    "annotations": [
        {"image_id": 1, "id": 101, "caption": "A man is riding a bicycle."}, 
        {"image_id": 1, "id": 102, "caption": "A person on a bike on a street."}, 
        {"image_id": 2, "id": 201, "caption": "Two dogs playing in the grass."}, 
        {"image_id": 2, "id": 202, "caption": "Dogs are running on a lawn."}
    ]
}

res_data = [
    {"image_id": 1, "caption": "A man cycling on a road.", "id": 301}, 
    {"image_id": 2, "caption": "Two puppies in a field.", "id": 302}
]

# To initialize COCOEvalCap, you need COCO objects for ground truth and results.
# These COCO objects are typically created from JSON files matching the COCO format.
# For a quickstart, we'll manually structure the data to match expected input.

# The COCO object expects a dictionary with 'images' and 'annotations' keys.
# We only need 'annotations' for caption evaluation.

# Mock COCO objects (simplified for quickstart, actual COCO objects handle more fields)
class MockCoco:
    def __init__(self, data):
        self.anns = {ann['id']: ann for ann in data.get('annotations', [])}
        self.imgToAnns = {}
        for ann in data.get('annotations', []):
            self.imgToAnns.setdefault(ann['image_id'], []).append(ann)

    def loadRes(self, res_json_or_list):
        # For simplicity, just store results. COCO.loadRes is more complex.
        res_anns = []
        for r in res_json_or_list:
            # Assign a unique ID if not present, similar to COCO API behavior
            if 'id' not in r:
                r['id'] = max(self.anns.keys(), default=0) + len(res_anns) + 1
            res_anns.append(r)
        res_coco = MockCoco({'annotations': res_anns})
        return res_coco

    def getImgIds(self):
        return list(self.imgToAnns.keys())

    def loadAnns(self, ids):
        return [self.anns[i] for i in ids]

# Initialize Mock COCO objects
# gts_coco_obj = COCO(gts_json_path)  # In a real application
gts_coco_obj = MockCoco(gts_data)

# res_coco_obj = gts_coco_obj.loadRes(res_json_path) # In a real application
res_coco_obj = gts_coco_obj.loadRes(res_data)

eval_ids = gts_coco_obj.getImgIds()

cocoEval = COCOEvalCap(gts_coco_obj, res_coco_obj, eval_ids)

cocoEval.evaluate()

print("Evaluation results:")
for metric, score in cocoEval.eval.items():
    print(f"{metric}: {score:.3f}")

view raw JSON →