Dask Expressions

2.0.0 · active · verified Thu Apr 09

Dask-expr provides a high-level expression system for Dask DataFrames, focusing on query optimization and improved organization. It became the default backend for `dask.dataframe` since Dask version 2024.3.0. The library, currently at version 2.0.0 (released January 21, 2025), is primarily maintained as part of the main Dask project, with its separate GitHub repository no longer actively maintained.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use Dask DataFrame, which leverages the dask-expr expression system internally for query optimization since Dask version 2024.3.0. No explicit `dask_expr` import is typically needed for standard DataFrame operations.

import dask.dataframe as dd

# Create a Dask DataFrame (internally uses the dask-expr system)
df = dd.from_dict({'a': range(1000), 'b': [f'cat_{i%5}' for i in range(1000)]}, npartitions=4)

# Perform some operations
result = df.groupby('b')['a'].mean()

# Compute the result
print(result.compute())

# To see the optimized query plan (requires graphviz to be installed)
# try:
#    df.optimize().explain()
# except ImportError:
#    print("Install graphviz to visualize the query plan: pip install 'graphviz'")

view raw JSON →