pyhf - Pure-Python HistFactory
pyhf is a pure-Python implementation of the HistFactory statistical model for binned data analysis, widely used in particle physics. It leverages modern tensor libraries like NumPy, TensorFlow, PyTorch, and JAX with automatic differentiation for efficient and scalable statistical inference. The current version is 0.7.6, and it follows a regular release cadence with patch releases addressing fixes and minor improvements, and minor versions introducing new features and sometimes API changes.
Common errors
-
ModuleNotFoundError: No module named 'tensorflow' (or 'torch' or 'jax')
cause An optional backend (TensorFlow, PyTorch, or JAX) was not installed, but pyhf attempted to use it or it was explicitly set as the backend.fixInstall the desired backend using the extra syntax, e.g., `pip install 'pyhf[tensorflow]'`. Alternatively, explicitly set a different backend like `pyhf.set_backend('numpy')` if it's installed. -
AttributeError: module 'numpy' has no attribute 'product'
cause You are using a pyhf version older than 0.7.3 with NumPy version 1.25.0 or newer. `np.product` was deprecated in NumPy 1.25.0.fixUpgrade pyhf to version 0.7.3 or newer: `pip install --upgrade pyhf`. This version updates pyhf's internal usage of `np.product` to `np.prod`. -
pyhf.exceptions.InvalidWorkspace: Schema validation failed:
cause The provided JSON or XML workspace definition does not conform to the HistFactory schema expected by pyhf, often due to missing required fields or incorrect data types.fixReview your workspace definition against the official HistFactory schema and pyhf's documentation. Ensure all required fields (e.g., 'channels', 'observations', 'samples') are present and correctly formatted, and that data types match expectations. -
TypeError: 'jax.Array' object cannot be interpreted as an integer
cause This (or similar `TypeError` for other backends like `torch.Tensor`) often occurs when mixing array types from different backends (e.g., passing a JAX array to a NumPy-expecting function) or attempting operations not supported by the current backend's tensor type.fixEnsure that all tensor operations use the functions provided by `pyhf.tensorlib` and that arrays are consistently managed within the chosen backend. Avoid direct mixing of `numpy.array` with JAX or PyTorch tensors without explicit conversion functions.
Warnings
- breaking Version 0.7.0 introduced significant API breaking changes, impacting workspace definition, model creation, and inference calls. Code written for versions prior to 0.7.0 will likely require updates.
- gotcha If using NumPy versions 1.25.0 or higher, the `np.product` function is deprecated. pyhf versions prior to 0.7.3 might raise deprecation warnings or errors related to this.
- gotcha When using the JAX backend with `jax` and `jaxlib` versions 0.4.20+, direct access to `jax.config` from nested modules within pyhf could lead to support issues. It must be accessed from the top-level `jax` API.
- gotcha Older pyhf versions (<0.7.4) could exhibit non-deterministic bugs related to accessing dead weakrefs while iterating over callbacks, particularly around `pyhf.set_backend` events, leading to crashes.
Install
-
pip install pyhf -
pip install 'pyhf[tensorflow]' -
pip install 'pyhf[torch]' -
pip install 'pyhf[jax]' 'jaxlib' -
pip install 'pyhf[full]'
Imports
- pyhf
import pyhf
- set_backend
from pyhf import set_backend; set_backend("numpy")import pyhf pyhf.set_backend("numpy") - Model
import pyhf model = pyhf.Model(...)
- fit
import pyhf result = pyhf.infer.mle.fit(...)
- qmu
mu_test_stat = pyhf.test_statistics.qmu(...)
import pyhf mu_test_stat = pyhf.infer.test_statistics.qmu(...)
Quickstart
import pyhf
import json
# Define a simple workspace (example adapted from pyhf documentation)
workspace_data = {
"channels": [
{
"name": "singlechannel",
"samples": [
{
"name": "signal",
"data": [12.0],
"modifiers": [
{"name": "mu", "type": "normfactor", "data": None},
{"name": "lumi", "type": "lumi", "data": {"correlated": True, "nom_data": 1.0, "rel_data": 0.1}}
]
},
{
"name": "background",
"data": [100.0],
"modifiers": [
{"name": "lumi", "type": "lumi", "data": {"correlated": True, "nom_data": 1.0, "rel_data": 0.1}},
{"name": "bkg_norm", "type": "normfactor", "data": None}
]
}
]
}
],
"observations": [
{
"name": "singlechannel",
"data": [120.0],
"modifier_data": [
{"name": "lumi", "type": "lumi", "data": 1.0}
]
}
]
}
# Set the backend (e.g., 'numpy', 'tensorflow', 'torch', or 'jax')
pyhf.set_backend("numpy")
# Create a model from the workspace data
workspace = pyhf.Workspace(workspace_data)
model = workspace.model(modifier_settings={'lumi': {'type': 'lumi', 'decorrelate': False}})
# Prepare data and initial parameters for the fit
# The model's data method handles observation and auxiliary data.
actual_data = model.data(workspace.data)
init_pars = model.config.suggested_init()
fixed_pars = model.config.suggested_fixed()
bounds = model.config.suggested_bounds()
# Perform Maximum Likelihood Estimation (MLE)
# This fits the model to the data to find the best-fit parameters.
fit_results = pyhf.infer.mle.fit(
data=actual_data,
pdf=model,
init_pars=init_pars,
fixed_params=fixed_pars,
par_bounds=bounds
)
print(f"Fitted parameters: {fit_results[0]}")
print(f"Parameter uncertainties: {fit_results[1]}")
# Example for hypothesis testing: fixing 'mu' (signal strength) to 0 (background-only hypothesis)
mu_index = model.config.modifier_index('mu') # Get index of the 'mu' parameter
bkg_only_init_pars = list(init_pars) # Create a mutable copy
bkg_only_init_pars[mu_index] = 0.0 # Set mu to 0
bkg_only_fixed_params = list(fixed_pars) # Create a mutable copy
bkg_only_fixed_params[mu_index] = True # Fix mu at 0
bkg_only_fit_results = pyhf.infer.mle.fit(
data=actual_data,
pdf=model,
init_pars=bkg_only_init_pars,
fixed_params=bkg_only_fixed_params,
par_bounds=bounds
)
print(f"Fitted parameters (mu=0 fixed): {bkg_only_fit_results[0]}")