{"id":624,"library":"dask","title":"Dask: Parallel PyData with Task Scheduling","description":"Dask is a flexible open-source Python library for parallel computing, enabling users to scale Python workflows from single machines to distributed clusters. It provides parallelized NumPy array, Pandas DataFrame, and Python list (Bag) objects, extending familiar interfaces to larger-than-memory or distributed environments. Dask maintains a frequent release cadence, typically releasing new versions monthly.","status":"active","version":"2026.3.0","language":"python","source_language":"en","source_url":"https://github.com/dask/dask/","tags":["parallel computing","distributed computing","data science","dataframe","array","etl","scalable python"],"install":[{"cmd":"pip install dask","lang":"bash","label":"Base Installation"},{"cmd":"pip install \"dask[complete]\"","lang":"bash","label":"With Common Dependencies (e.g., pandas, numpy, distributed)"},{"cmd":"conda install dask","lang":"bash","label":"Conda Installation"}],"dependencies":[{"reason":"Dask requires Python 3.10 or newer.","package":"python","optional":false},{"reason":"Required for efficient Parquet I/O and improved string type handling in DataFrames (as of Dask 2026.1.2).","package":"pyarrow","optional":false},{"reason":"Essential for dask.dataframe functionality, which mimics the pandas API.","package":"pandas","optional":true},{"reason":"Essential for dask.array functionality, which mimics the NumPy API.","package":"numpy","optional":true},{"reason":"Provides the distributed scheduler and client for multi-core or multi-machine execution.","package":"dask.distributed","optional":true}],"imports":[{"symbol":"dask.array","correct":"import dask.array as da"},{"symbol":"dask.dataframe","correct":"import dask.dataframe as dd"},{"symbol":"dask.bag","correct":"import dask.bag as db"},{"symbol":"dask.delayed","correct":"from dask import delayed"},{"symbol":"dask.distributed.Client","correct":"from dask.distributed import Client"},{"symbol":"dask.distributed.LocalCluster","correct":"from dask.distributed import LocalCluster"}],"quickstart":{"code":"from dask.distributed import Client, LocalCluster\nimport dask.dataframe as dd\nimport pandas as pd\n\n# 1. Start a local Dask cluster (optional, but recommended for actual parallelization)\n# Client() without arguments starts a LocalCluster by default\nclient = Client(n_workers=4, threads_per_worker=2, memory_limit='2GB')\nprint(f\"Dask Dashboard link: {client.dashboard_link}\")\n\n# 2. Create a Dask DataFrame from a large Pandas DataFrame or a collection of CSVs\n# For demonstration, let's create a large Pandas DataFrame first, then convert it\ndf_pandas = pd.DataFrame({\n    'A': range(10_000_000),\n    'B': [f'category_{i % 5}' for i in range(10_000_000)],\n    'C': [i * 1.5 for i in range(10_000_000)]\n})\nddf = dd.from_pandas(df_pandas, npartitions=client.nthreads)\n\n# Alternatively, read from files directly (more common in real-world scenarios):\n# ddf = dd.read_csv('s3://my-bucket/data-*.csv')\n\n# 3. Perform some operations (these are lazy and build a task graph)\nresult = ddf.groupby('B')['C'].mean()\n\n# 4. Trigger computation and get the result (e.g., as a Pandas Series)\nprint(\"\\nComputing the result...\")\nfinal_result = result.compute()\n\nprint(\"\\nFinal Result (first 5 rows):\\n\", final_result.head())\n\n# Close the client and cluster\nclient.close()\n","lang":"python","description":"This quickstart demonstrates how to initialize a local Dask cluster, create a Dask DataFrame (either from an existing Pandas DataFrame or by reading data directly), perform a lazy computation, and then trigger the execution using `.compute()` to retrieve the final result. The `client.dashboard_link` provides a URL to the Dask diagnostic dashboard, which is invaluable for monitoring computation progress and performance."},"warnings":[{"fix":"Upgrade Python to version 3.10 or newer.","message":"Dask dropped support for Python 3.9 in versions released prior to 2025.12.0. Users on older Python versions must upgrade to 3.10+.","severity":"breaking","affected_versions":"<=2025.11.x"},{"fix":"`pip install pyarrow>=16.0` or `conda install pyarrow>=16.0`.","message":"A hard dependency on `pyarrow >= 16.0` was introduced in Dask 2026.1.2. Users must ensure PyArrow is updated to this minimum version.","severity":"breaking","affected_versions":">=2026.1.2"},{"fix":"Always append `.compute()` to Dask collection operations when you need the final result in local memory (e.g., as a Pandas DataFrame or NumPy Array).","message":"Dask operations are 'lazy' and build a task graph without immediately executing computations. Users commonly forget to call `.compute()` (or `.persist()`, `.write_parquet()`, etc.) to trigger the actual work and retrieve results.","severity":"gotcha","affected_versions":"All"},{"fix":"Use Dask's built-in I/O functions (e.g., `dd.read_parquet()`, `da.from_zarr()`, `dd.read_csv()`) to load data directly into the Dask cluster, allowing Dask to manage the distributed loading and processing.","message":"Loading large Python objects (like a multi-GB Pandas DataFrame or NumPy array) into the client process and then passing them to Dask can be highly inefficient and lead to out-of-memory errors on the client. Dask then has to serialize and send these large objects over the network.","severity":"gotcha","affected_versions":"All"},{"fix":"Aim for partition sizes between 100-300 MiB. Adjust `npartitions` or `chunksize` parameters during DataFrame/Array creation or repartitioning based on your data size and cluster resources. Monitor the Dask dashboard for memory usage and task duration.","message":"Incorrect partition (chunk) sizing in Dask DataFrames/Arrays is a common cause of performance bottlenecks and memory issues. Partitions that are too large can lead to worker OOMs, while partitions that are too small incur high scheduling overhead.","severity":"gotcha","affected_versions":"All"},{"fix":"To enable PyArrow strings, set `dask.config.set({\"dataframe.convert-string\": True})` before creating DataFrames. Be aware that full compatibility for all operations is an ongoing effort, and some operations might still require conversion to 'object' dtype.","message":"With Pandas 2.x/3.x, the introduction of PyArrow-backed string dtypes significantly impacts memory usage and performance. Dask DataFrame's default string behavior might still be 'object' dtype unless explicitly configured.","severity":"gotcha","affected_versions":">=2023.03.01 (with Pandas >=2.0)"},{"fix":"Install required build tools before attempting to install Dask and its dependencies. For Alpine Linux, use 'apk add build-base python3-dev'.","message":"Building wheels for certain Dask dependencies (like lz4, numexpr, etc.) requires a C compiler (e.g., gcc). In minimal environments (like Alpine Linux or slim Docker images), these build tools are often not pre-installed, leading to installation failures.","severity":"breaking","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-05-12T16:53:24.734Z","next_check":"2026-06-26T00:00:00.000Z","problems":[{"fix":"Rename your local Python file (e.g., `dask.py`) to something else that does not conflict with Dask's module names. Then, clear any cached Python bytecode (`__pycache__` directories) and restart your Python interpreter or IDE.","cause":"This error typically occurs when a Python file in your project or current directory is inadvertently named `dask.py` (or `distributed.py`, `dask_dataframe.py`, etc.), shadowing the actual Dask library or its submodules. Python tries to import from your local file instead of the installed library.","error":"ModuleNotFoundError: No module named 'dask.dataframe'; 'dask' is not a package"},{"fix":"Ensure you have explicitly imported the specific Dask submodule you intend to use (e.g., `import dask.array as da` for Dask arrays or `import dask.dataframe as dd` for Dask DataFrames). If the issue persists, update Dask and its dependencies (`pip install --upgrade dask distributed 'dask[complete]'`) or reinstall in a clean environment.","cause":"This error usually arises when attempting to access a Dask submodule (like `dask.array` or `dask.dataframe`) without explicitly importing it first, or if there's a version mismatch/incomplete installation of Dask or its dependencies.","error":"AttributeError: module 'dask' has no attribute 'array'"},{"fix":"Verify that your Dask scheduler is running and accessible at the specified address and port (default 8786 for scheduler, 8787 for dashboard). If running locally, ensure no other process is using port 8786. If on a cluster, check network configurations and firewalls. Often, simply starting a `LocalCluster` or ensuring your `Client` points to an active scheduler will resolve it, e.g., `client = Client()` to start a local cluster automatically, or `client = Client('tcp://<scheduler_ip>:8786')` with the correct scheduler address.","cause":"This error indicates that your Dask client cannot establish a connection to the Dask scheduler, most commonly because the scheduler process is not running, is running on a different address/port, or a firewall is blocking the connection.","error":"ConnectionRefusedError: [Errno 111] Connection refused"},{"fix":"Provide explicit `meta` (metadata) argument to Dask operations like `map_partitions`, `apply`, or `groupby().apply()`. The `meta` argument should be an empty Pandas object (DataFrame or Series) with the correct column names and dtypes matching the expected output. For example, `ddf.apply(my_func, meta=pd.Series(dtype='float64'))` or `ddf.map_partitions(my_func, meta={'col1': 'int64', 'col2': 'object'})`.","cause":"Dask performs 'lazy evaluation' and needs to know the structure (column names, dtypes) of the output of your operations before computing. This `ValueError` occurs when Dask's inferred metadata for a custom function (especially with `map_partitions` or `apply`) doesn't match the actual output of that function, or if metadata cannot be inferred at all.","error":"ValueError: Metadata inference failed in ... (or 'The columns in the computed data do not match the columns in the provided metadata')"},{"fix":"Reduce the size of partitions by rechunking your Dask collections (`.rechunk()`), use more workers with smaller `memory_limit` per worker, or increase the `memory_limit` for your workers if your machine has available RAM. Analyze your Dask dashboard to identify memory-intensive tasks. Consider using `dask.persist()` or `dask.unpersist()` judiciously to manage memory, and check for 'unmanaged memory' as described in Dask's diagnostics. Ensure intermediate results are not being held in memory unnecessarily.","cause":"This warning (often leading to computation failures or hangs) indicates that a Dask worker has exceeded its allocated memory limit, causing the Nanny process to restart it. This usually happens when tasks consume too much memory, often due to loading large amounts of data into memory within a single task or inefficient operations.","error":"distributed.nanny.memory - WARNING - Worker tcp://127.0.0.1:... exceeded 95% memory budget. Restarting..."}],"ecosystem":"pypi","meta_description":null,"install_score":50,"install_tag":"draft","quickstart_score":0,"quickstart_tag":"stale","pypi_latest":"2026.3.0","install_checks":{"last_tested":"2026-05-12","tag":"draft","tag_description":"notable install failures or slow imports","results":[{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"complete","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":"36.9M"},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"complete","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"complete","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":15,"import_time_s":0.83,"mem_mb":24,"disk_size":"405M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":3.4,"import_time_s":null,"mem_mb":null,"disk_size":"38M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"complete","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":0.8,"mem_mb":24,"disk_size":"401M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"complete","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":"44.4M"},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"complete","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"complete","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":14.4,"import_time_s":1.59,"mem_mb":25.9,"disk_size":"431M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":3.5,"import_time_s":null,"mem_mb":null,"disk_size":"46M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"complete","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":1.5,"mem_mb":25.9,"disk_size":"426M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"complete","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":"34.4M"},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"complete","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"complete","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":14.4,"import_time_s":1.57,"mem_mb":22.3,"disk_size":"412M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":3.3,"import_time_s":null,"mem_mb":null,"disk_size":"36M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"complete","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":1.54,"mem_mb":22.3,"disk_size":"408M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"complete","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":"34.2M"},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"complete","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"complete","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":14.9,"import_time_s":1.37,"mem_mb":22.5,"disk_size":"411M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":3.3,"import_time_s":null,"mem_mb":null,"disk_size":"35M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"complete","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":1.45,"mem_mb":22.5,"disk_size":"407M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"complete","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":"33.9M"},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"complete","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"complete","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":17.7,"import_time_s":0.8,"mem_mb":19.3,"disk_size":"403M"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":3.8,"import_time_s":null,"mem_mb":null,"disk_size":"35M"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"complete","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":0.79,"mem_mb":19.3,"disk_size":"403M"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null}]},"quickstart_checks":{"last_tested":"2026-04-24","tag":"stale","tag_description":"widespread failures or data too old to trust","results":[{"runtime":"python:3.10-alpine","exit_code":1},{"runtime":"python:3.10-slim","exit_code":1},{"runtime":"python:3.11-alpine","exit_code":1},{"runtime":"python:3.11-slim","exit_code":1},{"runtime":"python:3.12-alpine","exit_code":1},{"runtime":"python:3.12-slim","exit_code":1},{"runtime":"python:3.13-alpine","exit_code":1},{"runtime":"python:3.13-slim","exit_code":1},{"runtime":"python:3.9-alpine","exit_code":1},{"runtime":"python:3.9-slim","exit_code":1}]}}