Dask Awkward

2026.2.1 · active · verified Thu Apr 16

dask-awkward combines Awkward Array's schema-driven, arbitrarily nested data structures with Dask's capabilities for parallel and out-of-core computation. It enables processing of complex, irregular data like physics event data or JSON records at scale. The library is actively maintained with frequent releases, typically on a monthly or bi-monthly cadence.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates creating a Dask-Awkward Array from an in-memory Awkward Array, performing a simple operation (getting the length of a nested list), and then computing the result. For real-world use, `da.from_parquet()` or `da.from_json()` are common entry points.

import dask_awkward as da
import awkward as ak

# Create a small Awkward Array
# This can be replaced by loading from a file, e.g., da.from_parquet()
data = ak.Array([{'x': 1, 'y': [1, 2]}, {'x': 2, 'y': []}, {'x': 3, 'y': [3]}])

# Convert it to a Dask Awkward Array with 2 partitions
dask_array = da.from_awkward(data, npartitions=2)

# Perform a simple operation: get the length of 'y' for each record
lengths = dask_array['y'].layout.lengths

# Compute the result
result = lengths.compute()
print(result)
# Expected output: [2, 0, 1]

view raw JSON →