{"id":8063,"library":"dask-awkward","title":"Dask Awkward","description":"dask-awkward combines Awkward Array's schema-driven, arbitrarily nested data structures with Dask's capabilities for parallel and out-of-core computation. It enables processing of complex, irregular data like physics event data or JSON records at scale. The library is actively maintained with frequent releases, typically on a monthly or bi-monthly cadence.","status":"active","version":"2026.2.1","language":"en","source_language":"en","source_url":"https://github.com/dask-contrib/dask-awkward","tags":["dask","awkward","array","distributed","big data","irregular data","nested data"],"install":[{"cmd":"pip install dask-awkward","lang":"bash","label":"Install core library"},{"cmd":"pip install 'dask-awkward[parquet,hdf5]' # For common file formats","lang":"bash","label":"Install with optional dependencies"}],"dependencies":[{"reason":"Core dependency for parallel and distributed computing.","package":"dask","optional":false},{"reason":"Core dependency for handling nested, irregular data structures.","package":"awkward","optional":false},{"reason":"Underlying array library for numerical operations.","package":"numpy","optional":false},{"reason":"Required for Parquet file I/O operations.","package":"pyarrow","optional":true},{"reason":"Required for HDF5 file I/O operations.","package":"h5py","optional":true}],"imports":[{"note":"Import Array directly from the top-level package.","wrong":"from dask_awkward.core import Array","symbol":"Array","correct":"from dask_awkward import Array"},{"note":"Commonly used functions like from_parquet are directly available under the top-level namespace.","wrong":"import dask_awkward.from_parquet","symbol":"from_parquet","correct":"from dask_awkward import from_parquet"}],"quickstart":{"code":"import dask_awkward as da\nimport awkward as ak\n\n# Create a small Awkward Array\n# This can be replaced by loading from a file, e.g., da.from_parquet()\ndata = ak.Array([{'x': 1, 'y': [1, 2]}, {'x': 2, 'y': []}, {'x': 3, 'y': [3]}])\n\n# Convert it to a Dask Awkward Array with 2 partitions\ndask_array = da.from_awkward(data, npartitions=2)\n\n# Perform a simple operation: get the length of 'y' for each record\nlengths = dask_array['y'].layout.lengths\n\n# Compute the result\nresult = lengths.compute()\nprint(result)\n# Expected output: [2, 0, 1]","lang":"python","description":"This quickstart demonstrates creating a Dask-Awkward Array from an in-memory Awkward Array, performing a simple operation (getting the length of a nested list), and then computing the result. For real-world use, `da.from_parquet()` or `da.from_json()` are common entry points."},"warnings":[{"fix":"Upgrade your Python environment to 3.10 or newer. Alternatively, pin `dask-awkward<2026.2.0` for Python 3.9, or `<2025.3.0` for Python 3.8.","message":"Python 3.9 support was dropped with version 2026.2.0, and Python 3.8 support was dropped with version 2025.3.0. Users on older Python versions must upgrade or pin `dask-awkward` to an earlier compatible version.","severity":"breaking","affected_versions":">=2026.2.0 (for Py3.9), >=2025.3.0 (for Py3.8)"},{"fix":"Ensure that your `dask-awkward` and `dask` installations are compatible. Install `dask-awkward` without pinning `dask` to allow `pip` to resolve compatible versions, or consult `dask-awkward`'s documentation for tested `dask` ranges.","message":"Dask's internal APIs, such as `DataFrameTreeReduction` (removed) and `Task` specifications (changed), have evolved. This means specific `dask-awkward` versions require compatible `dask` versions. Running `dask-awkward` with an incompatible `dask` version can lead to `AttributeError` or other runtime errors.","severity":"breaking","affected_versions":"All versions, especially around 2024.12.x and 2025.2.x releases of dask-awkward"},{"fix":"Ensure that operations that require concrete values are performed *after* calling `.compute()` on your Dask-Awkward Array. If writing custom functions, use `dask.array.map_blocks` or `dask.dataframe.map_partitions` with functions designed to operate on individual Awkward Array partitions.","message":"Using a Dask-Awkward Array (a 'tracer' or symbolic representation) in contexts that expect an immediate, concrete Awkward Array value can result in `TracerConversionError` or `RuntimeError: Awkward Array tracer used in a concrete context where a value is required.` This often happens inside user-defined functions or operations that are not explicitly Dask-aware.","severity":"gotcha","affected_versions":">=2024.9.0"},{"fix":"If a scalar is expected, extract it explicitly, e.g., `my_dask_array.compute().item()` or `my_dask_array.compute()[0]` if you are certain it's a single element.","message":"When performing filtering or indexing operations that are expected to return a single scalar value, a Dask-Awkward Array will often return a single-item array instead of a direct scalar value, requiring an extra step to extract the scalar.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Run `pip install dask-awkward` to install the package.","cause":"The dask-awkward library has not been installed.","error":"ModuleNotFoundError: No module named 'dask_awkward'"},{"fix":"Ensure that `.compute()` is called on the Dask-Awkward Array before attempting to use it in operations that require an immediate, concrete Awkward Array. For custom functions, use Dask's `map_blocks` or similar utilities.","cause":"An operation tried to access the concrete data of a Dask-Awkward Array (which is a symbolic representation) before `.compute()` was called.","error":"RuntimeError: Awkward Array tracer used in a concrete context where a value is required."},{"fix":"Upgrade your Python environment to version 3.10 or newer. Alternatively, install an older, compatible version of dask-awkward, e.g., `pip install 'dask-awkward<2026.2.0'` for Python 3.9.","cause":"Attempting to install or use a recent version of dask-awkward on an unsupported Python version (e.g., Python 3.9 or older).","error":"ERROR: Package 'dask-awkward' requires a different Python: 3.9.x not in '>=3.10'"},{"fix":"Upgrade `dask-awkward` to its latest version to ensure compatibility with recent `dask` releases. `pip install --upgrade dask-awkward`.","cause":"You are using an older `dask-awkward` version with a newer `dask` version where certain internal Dask APIs have been removed or changed.","error":"AttributeError: module 'dask.dataframe.core' has no attribute 'DataFrameTreeReduction'"}]}