{"id":6275,"library":"trec-car-tools","title":"TREC CAR Tools","description":"trec-car-tools is a Python library (version 2.6, released Feb 1, 2022) providing support for participants in the TREC Complex Answer Retrieval (CAR) track. It offers functionalities for reading and manipulating the TREC CAR dataset, which often comes in CBOR format, including annotations, paragraphs, and outlines. The library's release cadence appears to be tied to major TREC CAR track version releases.","status":"active","version":"2.6","language":"en","source_language":"en","source_url":"https://github.com/TREMA-UNH/trec-car-tools/python3","tags":["TREC","information retrieval","CAR dataset","CBOR","data parsing"],"install":[{"cmd":"pip install trec-car-tools","lang":"bash","label":"PyPI"},{"cmd":"conda install laura-dietz::trec-car-tools","lang":"bash","label":"Anaconda"}],"dependencies":[{"reason":"Required for handling CBOR formatted data. For Anaconda users, a specific version and channel is recommended.","package":"cbor","optional":false}],"imports":[{"symbol":"iter_annotations","correct":"from trec_car.read_data import iter_annotations"},{"symbol":"iter_paragraphs","correct":"from trec_car.read_data import iter_paragraphs"},{"symbol":"Page","correct":"from trec_car.read_data import Page"},{"symbol":"Paragraph","correct":"from trec_car.read_data import Paragraph"}],"quickstart":{"code":"import os\nfrom trec_car.read_data import iter_annotations, iter_paragraphs\n\n# Assuming 'train.test200.cbor' and 'train.test200.cbor.paragraphs' are available locally\n# You would typically download these from the TREC CAR website\n\n# Example 1: Reading annotations (pages file)\nannotations_file = os.environ.get('TREC_CAR_ANNOTATIONS_FILE', 'train.test200.cbor')\nif os.path.exists(annotations_file):\n    print(f\"\\nReading page IDs from {annotations_file}:\")\n    with open(annotations_file, 'rb') as f:\n        for page in iter_annotations(f):\n            print(f\"Page ID: {page.pageId}, Page Name: {page.pageName}\")\n            # Print first 2 pages only for brevity\n            if page.pageId and page.pageName: break\nelse:\n    print(f\"\\nSkipping annotation reading: {annotations_file} not found.\")\n\n# Example 2: Reading paragraphs file\nparagraphs_file = os.environ.get('TREC_CAR_PARAGRAPHS_FILE', 'train.test200.cbor.paragraphs')\nif os.path.exists(paragraphs_file):\n    print(f\"\\nReading paragraph text from {paragraphs_file}:\")\n    with open(paragraphs_file, 'rb') as f:\n        for para in iter_paragraphs(f):\n            print(f\"Paragraph ID: {para.paragraphId}, Text: {para.getText()[:100]}...\")\n            # Print first 2 paragraphs only for brevity\n            if para.paragraphId and para.getText(): break\nelse:\n    print(f\"\\nSkipping paragraph reading: {paragraphs_file} not found.\")","lang":"python","description":"This quickstart demonstrates how to read TREC CAR annotation and paragraph files using `iter_annotations` and `iter_paragraphs` functions. It assumes the dataset files are available locally. The code iterates through the first few pages and paragraphs to show their IDs, names, and text content."},"warnings":[{"fix":"Always check the TREC CAR website (trec-car.cs.unh.edu) for the recommended `trec-car-tools` version or branch corresponding to the dataset you are using. Older tools might not correctly parse newer data formats, and vice-versa.","message":"Data format versions for TREC CAR datasets can change between releases. Ensure you use a version of `trec-car-tools` compatible with your specific dataset version.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Use `conda install -c laura-dietz cbor=1.0.0` instead of `pip install cbor` or `conda install cbor` if encountering issues with CBOR data processing in an Anaconda environment.","message":"Anaconda users should install the `cbor` dependency from the `laura-dietz` channel for Python 3.6 to ensure compatibility.","severity":"gotcha","affected_versions":"All versions, specifically Python 3.6+ with Anaconda"},{"fix":"Be aware of these known issues, especially when processing specific dataset versions or structures. Refer to the GitHub issues page (github.com/TREMA-UNH/trec-car-tools/issues) for updates or workarounds.","message":"The GitHub issue tracker indicates several open issues, some of which suggest potential data parsing or consistency problems within the tools, such as `flat_headings_list is not flat` or `v2.0 dataset para id in manual qrels not found in paragraphCorpus`.","severity":"gotcha","affected_versions":"All versions (potential for certain data formats)"}],"env_vars":null,"last_verified":"2026-04-14T00:00:00.000Z","next_check":"2026-07-13T00:00:00.000Z"}