coffea

2026.4.0 · active · verified Wed Apr 15

coffea is a Python toolkit designed for columnar data analysis in High-Energy Physics (HEP), providing basic tools and wrappers for efficient manipulation of HEP event data. It integrates with modern big-data technologies like Dask, Parsl, and TaskVine to enable scaling analyses from local machines to computing clusters without code changes. The library is actively developed, currently at version 2026.4.0, with frequent, often monthly or bi-monthly, releases of its calendar-versioned major releases and backports for its 0.7.x branch.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to define a simple `coffea` processor to perform a basic analysis task (selecting dimuons and histogramming their invariant mass). It uses `coffea.processor.ProcessorABC` and runs with the `IterativeExecutor` for local execution. Note that for a truly runnable example without ROOT files, a dummy `events` object is constructed, and actual file loading is omitted. In a real application, `NanoEventsFactory` would load data from ROOT files.

import awkward as ak
import hist
from coffea import processor
from coffea.nanoevents import NanoEventsFactory, BaseSchema

# Define a simple processor
class MyProcessor(processor.ProcessorABC):
    def process(self, events):
        # For demonstration, assume 'events' has a 'Muon' collection
        # In a real scenario, events would be loaded from a file using NanoEventsFactory
        if 'Muon' not in events.fields:
            # Create dummy muons if not present, for runnable example
            dummy_muons = ak.zip({
                "pt": ak.Array([ [10, 20], [30] ]), 
                "eta": ak.Array([ [0.5, 1.2], [-0.8] ]), 
                "charge": ak.Array([ [1, -1], [1] ])
            }, depth_limit=1)
            events = ak.with_field(events, dummy_muons, "Muon")
            
        muons = events.Muon[events.Muon.pt > 15]
        
        # Select opposite-sign dimuons
        dimuons = ak.combinations(muons, 2, fields=["lead", "trail"])
        dimuons = dimuons[dimuons.lead.charge != dimuons.trail.charge]
        
        # Calculate invariant mass (simplified for example)
        # In a real analysis, vector-like operations would be used
        if len(dimuons) > 0:
             # Dummy mass calculation for illustration
            mass = ak.flatten(dimuons.lead.pt + dimuons.trail.pt)
        else:
            mass = ak.Array([])

        # Create a histogram and fill it
        h_mass = hist.Hist.new.Reg(50, 0, 100, name="mass", label="Dimuon Mass [GeV]").Double()
        h_mass.fill(mass=mass)

        return {"mymass_histogram": h_mass, "nevents": len(events)}

    def postprocess(self, accumulator):
        return accumulator

# Example usage with a local executor
fileset = {"dataset_A": ["dummy_file.root"]}
# Create a dummy events object for local testing without actual file I/O
events_data = {"event_id": ak.Array([1, 2, 3])}
dummy_nanoevents = NanoEventsFactory.from_dict(events_data, schemaclass=BaseSchema).events()

# Instantiate the processor
my_processor_instance = MyProcessor()

# Run the processor with a local executor
# In a real scenario, you'd load actual ROOT files
output = processor.Runner(
    executor=processor.IterativeExecutor(status=False),
    schema=BaseSchema,
    xrootdtimeout=0 # dummy, for local execution
)(fileset, "Events", processor_instance=my_processor_instance)

print(output["mymass_histogram"])

view raw JSON →