{"id":8064,"library":"dask-histogram","title":"Dask Histogram","description":"dask-histogram provides parallel and out-of-core histogramming capabilities by integrating Dask with the boost-histogram library. It enables users to compute histograms efficiently on large datasets that may not fit into memory, leveraging Dask's distributed computing framework. The library currently operates on version 2026.2.0 and follows a rapid release cadence, often releasing monthly or bi-monthly updates.","status":"active","version":"2026.2.0","language":"en","source_language":"en","source_url":"https://github.com/dask-contrib/dask-histogram","tags":["dask","histogram","parallel-computing","data-analysis","physics","boost-histogram"],"install":[{"cmd":"pip install dask-histogram","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Core dependency for parallel computing.","package":"dask","optional":false},{"reason":"Numerical computing backend for Dask arrays.","package":"numpy","optional":false},{"reason":"Underlying histogramming library providing efficient C++ core.","package":"boost-histogram","optional":false},{"reason":"Provides backports of features from Python's typing module.","package":"typing_extensions","optional":false},{"reason":"Used for specific optimizations and features when handling awkward arrays.","package":"dask-awkward","optional":true}],"imports":[{"symbol":"Hist","correct":"from dask_histogram import Hist"},{"symbol":"histogram","correct":"from dask_histogram.routines import histogram"}],"quickstart":{"code":"import dask.array as da\nfrom dask_histogram.routines import histogram\nimport boost_histogram as bh\n\n# Create a large Dask array\nx = da.random.normal(0, 1, size=(10_000_000,), chunks=1_000_000)\n\n# Method 1: NumPy-like interface\nbins = 50\nrange_min, range_max = -5, 5\ndask_hist_numpy_like = histogram(x, bins=bins, range=(range_min, range_max))\n\nprint(f\"NumPy-like Dask histogram (lazy): {dask_hist_numpy_like}\")\ncomputed_hist_numpy_like = dask_hist_numpy_like.compute()\nprint(f\"Computed histogram (NumPy-like): {computed_hist_numpy_like.view()}\")\n\n# Method 2: boost-histogram like interface\nfrom dask_histogram import Hist\n\nbh_hist = (bh.Histogram(bh.axis.Regular(bins, range_min, range_max, metadata=\"x\")))\n\ndask_hist_bh_like = Hist.from_boost_histogram(bh_hist, x)\n\nprint(f\"boost-histogram-like Dask histogram (lazy): {dask_hist_bh_like}\")\ncomputed_hist_bh_like = dask_hist_bh_like.compute()\nprint(f\"Computed histogram (boost-histogram-like): {computed_hist_bh_like.view()}\")","lang":"python","description":"Demonstrates creating Dask histograms using both a NumPy-like routine and by wrapping a boost-histogram object with a Dask array. Remember to call `.compute()` to get the final histogram object."},"warnings":[{"fix":"Always call `.compute()` on the Dask histogram object to get the final result. Example: `final_histogram = dask_histogram_obj.compute()`","message":"Dask histograms are lazy computations. They return a Dask object that needs to be explicitly computed using `.compute()` to obtain the final boost-histogram object with actual results. Failing to call `.compute()` will result in working with a Dask graph, not the histogram data itself.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Keep `dask-histogram` updated to its latest version to ensure compatibility with recent Dask releases. Check release notes for specific Dask version requirements.","message":"Compatibility with Dask versions can be sensitive. For instance, `dask-histogram.factory` functionality was broken with `dask>=2024.12.0` and required an update in `dask-histogram==2024.12.0` to fix. Ensure your `dask-histogram` version is compatible with your `dask` version, especially after major Dask releases.","severity":"breaking","affected_versions":"Prior to 2024.12.0 when used with dask>=2024.12.0"},{"fix":"Ensure all data intended for filling a `dask_histogram.Hist` object are Dask arrays. Convert NumPy arrays to Dask arrays first (e.g., `da.from_array(my_numpy_array)`).","message":"When using `dask_histogram.Hist.fill()`, the arguments (e.g., `x`, `y`) must be Dask arrays, not raw NumPy arrays or scalar values, unlike `boost-histogram`'s direct `fill()` method. This is a common mistake when migrating from `boost-histogram` to `dask-histogram`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Adjust any custom graph introspection logic to account for delayed graph creation. The graph is fully formed only after `.compute()` is invoked or a Dask operation that triggers graph building occurs.","message":"The internal Dask graph construction for `Hist.fill()` was optimized in version `2024.3.0` to delay the creation of the task graph until `.compute()` is called. This can affect users who were relying on inspecting the Dask graph immediately after calling `fill()` but before `compute()`.","severity":"gotcha","affected_versions":">=2024.3.0"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Call `.compute()` on the DaskHistogram object first to get the concrete boost-histogram instance. Example: `computed_hist = dask_hist_obj.compute(); print(computed_hist.view())`","cause":"Attempting to access histogram data attributes (like 'values', 'counts', 'sum_weights') on a lazy DaskHistogram object before calling `.compute()`.","error":"AttributeError: 'DaskHistogram' object has no attribute 'values'"},{"fix":"Ensure all objects passed into Dask operations or stored within the Dask graph are serializable. This often requires careful construction of custom functions or `boost-histogram` axis definitions. Sometimes restarting the Dask client or environment can resolve transient pickling issues.","cause":"A non-serializable object (like a lock or certain complex Python objects) was inadvertently included in the Dask graph, making it impossible to send across processes in a distributed Dask setup.","error":"TypeError: cannot pickle '_thread.RLock' object"},{"fix":"Review your `boost_histogram.Histogram` definition and the data arrays passed to `Hist.fill()`. For N-dimensional histograms, you need N 1D arrays, each corresponding to an axis.","cause":"The number of data arrays provided to `Hist.fill()` does not match the number of axes defined in the underlying `boost-histogram` object.","error":"ValueError: Mismatched number of dimensions in fill data"},{"fix":"Update `dask-histogram` to the latest version. If `factory` functions are still missing or problematic, consult the `dask-histogram` documentation for the current recommended way to create or manipulate histogram layers, as the API might have changed.","cause":"The `dask_histogram.factory` module or its specific functions were either removed, refactored, or are incompatible with the installed Dask version, especially after Dask 2024.12.","error":"AttributeError: module 'dask_histogram' has no attribute 'factory'"}]}