{"id":1442,"library":"dask-expr","title":"Dask Expressions","description":"Dask-expr provides a high-level expression system for Dask DataFrames, focusing on query optimization and improved organization. It became the default backend for `dask.dataframe` since Dask version 2024.3.0. The library, currently at version 2.0.0 (released January 21, 2025), is primarily maintained as part of the main Dask project, with its separate GitHub repository no longer actively maintained.","status":"active","version":"2.0.0","language":"en","source_language":"en","source_url":"https://github.com/dask/dask","tags":["dask","dataframe","query-optimization","distributed-computing","big-data"],"install":[{"cmd":"pip install dask-expr","lang":"bash","label":"PyPI"},{"cmd":"conda install conda-forge::dask-expr","lang":"bash","label":"Conda-forge"}],"dependencies":[{"reason":"Core Dask functionality, dask-expr is the default backend for dask.dataframe.","package":"dask","optional":false},{"reason":"Dask-expr requires pandas version 2.0 or higher.","package":"pandas","optional":false},{"reason":"Often used for efficient data interchange and I/O with Parquet files.","package":"pyarrow","optional":true},{"reason":"For reading/writing from various file systems (e.g., S3, GCS).","package":"fsspec","optional":true}],"imports":[{"note":"While functional, dask-expr is now the default backend for dask.dataframe. It is often not explicitly imported for basic DataFrame operations.","symbol":"dask_expr","correct":"import dask_expr as dx"},{"note":"The expression system is implicitly used when working with dask.dataframe since Dask 2024.3.0.","symbol":"dask.dataframe","correct":"import dask.dataframe as dd"}],"quickstart":{"code":"import dask.dataframe as dd\n\n# Create a Dask DataFrame (internally uses the dask-expr system)\ndf = dd.from_dict({'a': range(1000), 'b': [f'cat_{i%5}' for i in range(1000)]}, npartitions=4)\n\n# Perform some operations\nresult = df.groupby('b')['a'].mean()\n\n# Compute the result\nprint(result.compute())\n\n# To see the optimized query plan (requires graphviz to be installed)\n# try:\n#    df.optimize().explain()\n# except ImportError:\n#    print(\"Install graphviz to visualize the query plan: pip install 'graphviz'\")","lang":"python","description":"This quickstart demonstrates how to use Dask DataFrame, which leverages the dask-expr expression system internally for query optimization since Dask version 2024.3.0. No explicit `dask_expr` import is typically needed for standard DataFrame operations."},"warnings":[{"fix":"Ensure all dependencies are compatible with `pandas>=2`. You may need to upgrade or constrain other libraries if conflicts arise. Consider using isolated environments (conda, virtualenv) for different projects.","message":"The `dask-expr` library, particularly versions 0.5.1 and above, requires `pandas>=2`. This can lead to dependency conflicts if other libraries in your environment pin `pandas` to an older version (e.g., `<2`).","severity":"breaking","affected_versions":">=0.5.1"},{"fix":"For contributions or detailed technical understanding, refer to the Dask main repository and documentation, specifically the `dask.dataframe` sections. Continue to install `dask-expr` if your Dask version is older than 2024.3.0 and you wish to use the query planning features. Otherwise, it is installed by default with recent Dask versions.","message":"The `dask-expr` GitHub repository is no longer actively maintained, as its implementation has been moved into the main `dask/dask` repository and it's now the default backend for `dask.dataframe`. While the PyPI package still exists, new development and core maintenance happen within the Dask project itself.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Use `persist()` sparingly and only when absolutely necessary, or when the full dataset is genuinely needed for subsequent complex operations. Re-evaluate if `persist()` is truly required, as the optimizer often handles intermediate computations efficiently without explicit persistence.","message":"Using `df.persist()` with the dask-expr query optimizer can sometimes block optimizations like projection or filter pushdown into the I/O layer.","severity":"gotcha","affected_versions":"All versions where dask-expr is enabled"},{"fix":"If 'named GroupBy Aggregations' are critical for your workflow, you might need to structure your aggregations differently (e.g., performing multiple individual aggregations and then combining them) or temporarily opt-out of the dask-expr backend if using an older Dask version where it wasn't the default (`dask.config.set({'dataframe.query-planning': False})`).","message":"Dask-expr does not currently support 'named GroupBy Aggregations', which is a feature available in the legacy Dask DataFrame API.","severity":"gotcha","affected_versions":"All versions where dask-expr is enabled"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}