Dask Histogram

2026.2.0 · active · verified Thu Apr 16

dask-histogram provides parallel and out-of-core histogramming capabilities by integrating Dask with the boost-histogram library. It enables users to compute histograms efficiently on large datasets that may not fit into memory, leveraging Dask's distributed computing framework. The library currently operates on version 2026.2.0 and follows a rapid release cadence, often releasing monthly or bi-monthly updates.

Common errors

Warnings

Install

Imports

Quickstart

Demonstrates creating Dask histograms using both a NumPy-like routine and by wrapping a boost-histogram object with a Dask array. Remember to call `.compute()` to get the final histogram object.

import dask.array as da
from dask_histogram.routines import histogram
import boost_histogram as bh

# Create a large Dask array
x = da.random.normal(0, 1, size=(10_000_000,), chunks=1_000_000)

# Method 1: NumPy-like interface
bins = 50
range_min, range_max = -5, 5
dask_hist_numpy_like = histogram(x, bins=bins, range=(range_min, range_max))

print(f"NumPy-like Dask histogram (lazy): {dask_hist_numpy_like}")
computed_hist_numpy_like = dask_hist_numpy_like.compute()
print(f"Computed histogram (NumPy-like): {computed_hist_numpy_like.view()}")

# Method 2: boost-histogram like interface
from dask_histogram import Hist

bh_hist = (bh.Histogram(bh.axis.Regular(bins, range_min, range_max, metadata="x")))

dask_hist_bh_like = Hist.from_boost_histogram(bh_hist, x)

print(f"boost-histogram-like Dask histogram (lazy): {dask_hist_bh_like}")
computed_hist_bh_like = dask_hist_bh_like.compute()
print(f"Computed histogram (boost-histogram-like): {computed_hist_bh_like.view()}")

view raw JSON →