Numbagg: Fast N-dimensional Aggregation
Numbagg provides fast N-dimensional aggregation functions accelerated by Numba's just-in-time (JIT) compiler and NumPy's generalized universal function (gufunc) machinery. It aims to outperform libraries like pandas, bottleneck, and NumPy for certain operations, especially with parallelization. The library is currently at version 0.9.4 and maintains an active development pace with regular updates.
Warnings
- gotcha Numbagg is currently considered experimental and not yet ready for production use, as stated on its PyPI page. While robust, its API or internal workings may evolve.
- gotcha The first call to any Numbagg function incurs a significant performance penalty due to Numba's Just-In-Time (JIT) compilation. Subsequent calls to the same function with compatible argument types will be much faster.
- deprecated Numbagg's grouped calculation functions (e.g., `numbagg.grouped.group_nanmean`) might be deprecated in favor of using `flox` with `numbagg` as a backend. This is an ongoing discussion within the xarray community, with `flox` potentially offering better support for nD array grouped by 1D labels.
- gotcha The `numbagg.decorators` module, used internally for creating JIT-compiled aggregation functions, is not part of Numbagg's public API. Its functions and signatures may change without prior notice.
Install
-
pip install numbagg
Imports
- nansum
import numbagg result = numbagg.nansum(...)
- move_mean
import numbagg result = numbagg.move_mean(...)
- ndreduce
from numbagg.decorators import ndreduce
Quickstart
import numbagg
import numpy as np
a = np.array([1, 2, np.nan, 4, 5])
b = np.random.rand(10, 5)
# Calculate sum, ignoring NaNs
sum_result = numbagg.nansum(a)
print(f"nansum(a): {sum_result}")
# Calculate moving mean with a window of 3
moving_mean_result = numbagg.move_mean(b, window=3, axis=1)
print(f"move_mean(b, window=3, axis=1, shape): {moving_mean_result.shape}")