xarray-einstats
xarray-einstats is a Python library that provides a high-level API to combine the symbolic array manipulation of `einstats` with `xarray`'s labeled dimensions for statistical operations and linear algebra. It aims to simplify common data analysis tasks on `xarray.DataArray` and `xarray.Dataset` objects, enabling operations like reduction, broadcasting, and reshaping with `einops`-like patterns. The current version is 0.10.0, and it generally follows an irregular release cadence, with updates typically driven by new features or important bug fixes.
Warnings
- breaking As of version 0.10.0, `xarray-einstats` requires Python 3.12 or newer. This is a significant bump from previous versions (e.g., 0.9.0+ required Python 3.10+).
- breaking Version 0.10.0 introduced minimum version requirements for key dependencies: `xarray>=2024.2.0`, `scipy>=1.11.0`, `numpy>=1.25.0`, and `einstats>=0.7.0`. Older versions of these libraries are no longer supported.
- gotcha The `dims` argument in `xarray-einstats` functions (e.g., `mean`, `sum`, `quantile`) accepts various formats: a string, a list of strings, or a dictionary for `einops`-like patterns. Misunderstanding the dictionary format, especially for reshaping or combining dimensions, is a common source of errors.
Install
-
pip install xarray-einstats
Imports
- stats
import xarray_einstats.stats as xestats
- mean
from xarray_einstats.stats import mean
Quickstart
import xarray as xr
import numpy as np
import xarray_einstats.stats as xestats
# Create a dummy xarray DataArray
data = xr.DataArray(
np.random.normal(size=(2, 3, 4)),
coords={"a": [0, 1], "b": [0, 1, 2], "c": [0, 1, 2, 3]},
dims=["a", "b", "c"],
)
print("Original data:\n", data)
# Calculate the mean along dimension 'b'
mean_b = xestats.mean(data, dims="b")
print("\nMean along 'b' dimension:\n", mean_b)
# Use einops-like patterns for more complex operations
# Reshape 'b c' into a new dimension '(b c)' and then sum
sum_bc_flattened = xestats.sum(data, dims={"b c": "_"}) # '_' signifies a new dimension combining 'b' and 'c'
print("\nSum after flattening 'b' and 'c' dimensions:\n", sum_bc_flattened)