numpy-groupies
Numpy-groupies is a Python library offering optimized tools for group-indexing operations, most notably the `aggregate` function. It provides a highly performant and lightweight alternative to operations like `pandas.groupby` for specific use cases, with implementations leveraging plain NumPy, Numba, and sometimes Weave for speed. The current version is 0.11.3, and it receives active maintenance and updates.
Warnings
- gotcha For optimal performance, `numba` should be installed. Without it, `numpy-groupies` will automatically fall back to a slower NumPy-only implementation, potentially leading to unexpected performance degradation.
- breaking Users upgrading to NumPy 2.0 should verify `numpy-groupies` compatibility. NumPy 2.0 introduces significant breaking changes, including an ABI break, changes to type promotion rules, and API modifications that may affect packages depending on it.
- gotcha When using the `aggregate` function with multidimensional arrays and the `axis` argument, carefully review the documentation regarding different 'Forms' of inputs and outputs. The behavior, especially concerning output shapes and broadcasting, can be complex and non-obvious.
- gotcha The interaction of `fill_value` and `dtype` parameters in `aggregate` can lead to implicit type coercion. If `dtype=None`, a 'sensible type' is chosen, which might not always align with user expectations, especially when handling `NaN` values or mixed data types.
Install
-
pip install numpy-groupies
Imports
- aggregate
from numpy_groupies import aggregate
Quickstart
import numpy as np
from numpy_groupies import aggregate
# Example data: values 'a' to be grouped by 'group_idx'
group_idx = np.array([3, 0, 0, 1, 0, 3, 5, 5, 0, 4])
a = np.array([13.2, 3.5, 3.5, -8.2, 3.0, 13.4, 99.2, -7.1, 0.0, 53.7])
# Aggregate sum for each group
result_sum = aggregate(group_idx, a, func='sum', fill_value=0)
print(f"Aggregated sum: {result_sum}")
# Expected: [10. -8.2 0. 26.6 53.7 92.1]
# Aggregate count of elements in each group
result_count = aggregate(group_idx, a, func='count', fill_value=0)
print(f"Aggregated count: {result_count}")
# Expected: [4 1 0 2 1 2]
# Aggregate mean of values in each group
result_mean = aggregate(group_idx, a, func='mean', fill_value=0)
print(f"Aggregated mean: {result_mean}")
# Expected: [ 2.5 -8.2 0. 13.35 53.7 46.05]