Dask-GeoPandas
raw JSON → 0.5.0 verified Fri May 01 auth: no python
Parallel GeoPandas with Dask. Extends GeoPandas to work with Dask for parallel and distributed execution of geospatial operations on large datasets. Current version: 0.5.0. Release cadence: irregular, roughly 1-2 releases per year.
pip install dask-geopandas Common errors
error ImportError: cannot import name 'read_file' from 'dask_geopandas' ↓
cause Older version of dask-geopandas may not have read_file; or the import path was from a different location.
fix
Upgrade to the latest version:
pip install --upgrade dask-geopandas. If still failing, use from dask_geopandas.io import read_file. error TypeError: 'GeoDataFrame' object does not support item assignment ↓
cause Trying to assign values to a Dask GeoDataFrame as if it were a pandas GeoDataFrame.
fix
Use
.assign() or map_partitions() to modify columns lazily. error distributed.utils_test - ValueError: Input data has no spatial partitions set. Call `ddf = ddf.spatial_shuffle()` first. ↓
cause Spatial join or other spatial operation requires spatial partitions to be set.
fix
Call
ddf = ddf.spatial_shuffle() before performing spatial operations like sjoin(). error ModuleNotFoundError: No module named 'dask_expr' ↓
cause Dask's new query planning (>=2024.3.0) requires dask-expr, which may not be installed.
fix
Install dask[dataframe] which includes dask-expr:
pip install dask[dataframe]. Warnings
deprecated The `geom_almost_equals` method has been removed in v0.5.0. Use `geom_equals_exact` instead. ↓
fix Replace calls to `ddf.geom_almost_equals` with `ddf.geom_equals_exact`.
breaking Shapely >=2 is now required; support for PyGEOS has been removed since v0.4.0. ↓
fix Ensure you have Shapely >=2 installed (`pip install shapely>=2`). Uninstall PyGEOS if present.
gotcha `spatial_shuffle` may produce incorrect results if the meta object from `read_file` is not set correctly. Ensure you use the latest version (>=0.4.2) to avoid this bug. ↓
fix Upgrade to dask-geopandas >=0.4.2.
gotcha When using Dask's new query planning (dask >=2024.3.0), you must have dask-expr installed to avoid errors. It is installed automatically with `dask[dataframe]`. ↓
fix Install dask[dataframe] or dask-expr: `pip install dask[dataframe]`.
breaking Dask-GeoPandas now requires Python >=3.10 as of v0.5.0. ↓
fix Upgrade Python to 3.10 or later.
Imports
- GeoDataFrame wrong
from dask_geopandas import GeoDataFrame as dask_gdfcorrectfrom dask_geopandas import GeoDataFrame - from_delayed wrong
from dask_geopandas.io import from_delayedcorrectfrom dask_geopandas import from_delayed - read_file wrong
import dask_geopandas; dask_geopandas.read_file()correctfrom dask_geopandas import read_file
Quickstart
import geopandas as gpd
from dask_geopandas import read_file
# Read a GeoJSON file in parallel (lazy)
ddf = read_file('path/to/file.geojson')
print(ddf.head())