{"id":5937,"library":"flox","title":"Fast GroupBy operations for Dask Arrays","description":"Flox is a Python library that provides strategies for fast GroupBy reductions with dask.array, significantly enhancing performance for operations like climatologies, resampling, and histogramming. It was formerly known as `dask_groupby` and integrates seamlessly with xarray to offer more performant GroupBy and Resampling operations.","status":"active","version":"0.11.2","language":"en","source_language":"en","source_url":"https://github.com/pydata/flox","tags":["dask","xarray","groupby","array","scientific-computing","data-analysis","parallel-computing"],"install":[{"cmd":"pip install flox","lang":"bash","label":"Install stable release"}],"dependencies":[{"reason":"Core functionality relies on Dask arrays for parallel computing.","package":"dask"},{"reason":"Provides enhanced GroupBy and Resampling operations for Xarray objects.","package":"xarray","optional":true},{"reason":"Wraps vectorized implementations for nD array reductions.","package":"numpy_groupies","optional":false}],"imports":[{"symbol":"groupby_reduce","correct":"from flox import groupby_reduce"},{"note":"For Xarray objects and lazy grouping by Dask arrays.","symbol":"xarray_reduce","correct":"from flox.xarray import xarray_reduce"}],"quickstart":{"code":"import dask.array as da\nfrom flox import groupby_reduce\nimport numpy as np\n\n# Create a sample Dask array\ndata = da.random.random((1000, 10), chunks=(100, 10))\n\n# Create a 'by' array for grouping (e.g., categories 0-9)\ngroups = np.random.randint(0, 10, size=1000)\n\n# Perform a GroupBy reduction (e.g., mean)\nresult_mean, group_labels = groupby_reduce(\n    data, groups, func=\"mean\", expected_groups=np.arange(10)\n)\n\nprint(\"Grouped Means (first 5 groups):\\n\", result_mean.compute()[:5])\nprint(\"Group Labels:\\n\", group_labels)","lang":"python","description":"This quickstart demonstrates how to use `flox.groupby_reduce` with a Dask array and a NumPy array of group labels to compute the mean for each group. The `expected_groups` argument ensures all groups are present in the output, even if some are empty."},"warnings":[{"fix":"Update imports from `dask_groupby` to `flox` and adjust any related API calls.","message":"The library was previously known as `dask_groupby`. Code relying on the old package name or import paths will break.","severity":"breaking","affected_versions":"<0.1.0 (pre-rename)"},{"fix":"Be aware that Xarray's GroupBy methods might be leveraging `flox`. To debug performance or unexpected behavior, consider isolating `flox` or temporarily uninstalling it to revert to Xarray's default GroupBy engine.","message":"When `flox` (version >= 2022.06.0) is installed, Xarray will automatically use `flox` by default for its `.groupby`, `.groupby_bins`, and `.resample` operations. This implicit usage can change performance characteristics or expose underlying `flox` issues.","severity":"gotcha","affected_versions":"xarray>=2022.06.0 with flox installed"},{"fix":"Rely on the built-in reduction functions (e.g., 'mean', 'sum') first. If custom logic is required, thoroughly test its behavior and consult documentation/issues for known limitations.","message":"Custom reductions specified using `Aggregation` instances might not be fully functional or have undefined behavior in certain scenarios.","severity":"gotcha","affected_versions":"All current versions (0.11.2)"},{"fix":"Monitor Dask dashboard for task graph progression. Consider rechunking strategies, explicit method selection (e.g., `method='map-reduce'`), or breaking down computations into smaller steps if memory issues persist.","message":"High memory usage can occur with `flox` aggregations in Dask, particularly when lower-level tasks (e.g., data loading) continue running while higher-level reduction tasks are uncomputed.","severity":"gotcha","affected_versions":"All current versions (0.11.2)"},{"fix":"If performance is not as expected, experiment with `method` and `reindex` arguments (e.g., `method='map-reduce'`, `method='blockwise'`, `method='cohorts'`) to find the best strategy for your data.","message":"For Dask arrays, `flox` uses heuristics (since v0.9.0) to choose the optimal parallel algorithm (`map-reduce`, `blockwise`, `cohorts`). While generally robust, specific data distributions or chunking patterns might benefit from explicitly setting the `method` parameter in `groupby_reduce` or `xarray_reduce`.","severity":"gotcha","affected_versions":">=0.9.0"}],"env_vars":null,"last_verified":"2026-04-14T00:00:00.000Z","next_check":"2026-07-13T00:00:00.000Z","problems":[]}