Uproot
Uproot is a Python library designed for reading and writing files in the ROOT format, commonly used in high-energy physics. It provides ROOT I/O functionality in pure Python and NumPy, without requiring the C++ ROOT installation. As part of the Scikit-HEP project, Uproot efficiently integrates with modern Python data analysis tools like NumPy and Awkward Array. The library is actively maintained, currently at version 5.7.3, with frequent updates and bug fixes.
Warnings
- breaking Uproot 4 and 5 introduced significant API changes compared to Uproot 3, particularly concerning how TTree methods return arrays and the underlying Awkward Array version. Uproot 3 used Awkward 0.x, while Uproot 4/5 uses Awkward 1.x (now just `awkward`).
- deprecated The `TTree.array()` method for reading a single branch from a TTree has been deprecated and removed in Uproot 4 and 5.
- gotcha While Uproot can return NumPy arrays (`library='np'`), using NumPy for jagged data (e.g., variable-length arrays per event) can lead to performance degradation. NumPy arrays of Python objects negate vectorized performance benefits.
- gotcha Calling `TTree.arrays()` to read a large number of branches or a very large dataset into memory at once can lead to out-of-memory errors or significant performance issues.
- gotcha Writing ROOT files with Uproot 4/5 has certain limitations compared to the full C++ ROOT implementation. Features like updating existing files (uproot.update) or writing complex nested C++ objects directly might not be fully supported or have specific constraints (e.g., basket sizes).
Install
-
pip install uproot
Imports
- uproot
import uproot
- ak
import awkward as ak
Quickstart
import uproot
import awkward as ak
# Open a remote ROOT file using XRootD protocol
# A real-world ROOT file (CMS Open Data example)
file_url = 'root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/ForHiggsTo4Leptons/SMHiggsToZZTo4L.root'
with uproot.open(file_url) as file:
# List keys (objects) in the ROOT file
print(f"File keys: {file.keys()}")
# Access a TTree named 'Events'
events = file['Events']
print(f"\nTree name: {events.name}")
print(f"Number of entries: {events.num_entries}")
# List branches (columns) in the TTree
print(f"\nBranches in 'Events': {events.keys()}")
# Read a single branch (e.g., 'Muon_pt') into an Awkward Array
muon_pt = events['Muon_pt'].array()
print(f"\nFirst 5 Muon_pt values: {muon_pt[:5]}")
print(f"Type of Muon_pt: {type(muon_pt)}")
# Read multiple branches into a dictionary of Awkward Arrays
# (or a single record array if 'library="ak"' is explicit or default)
muon_data = events.arrays(['Muon_pt', 'Muon_eta', 'Muon_phi'], library='ak')
print(f"\nFirst entry of Muon_data: {muon_data[0]}")
print(f"Type of Muon_data: {type(muon_data)}")
# Example: filter events (apply a 'cut') and get a branch
# (Requires a numerical array for comparison, assume 'Muon_pt' is always present)
if 'Muon_pt' in events.keys():
high_pt_muons = events.arrays('Muon_pt', cut='nMuon > 0 && Muon_pt[0] > 20', library='ak')
print(f"\nNumber of events with a leading muon pt > 20: {len(high_pt_muons)}")