GeoArrow Pandas
GeoArrow Pandas provides GeoArrow extension types for pandas DataFrames, enabling efficient storage and manipulation of geospatial data using the Apache Arrow memory format. It integrates with pandas by implementing the ExtensionArray and ExtensionDtype protocols. The current version is 0.1.1, and it is considered an experimental library with a relatively slow release cadence, indicating ongoing development.
Common errors
-
TypeError: cannot convert '...' to GeoArrowExtensionDtype
cause The data being passed to `pd.Series` (or similar constructor) is not in a format that `geoarrow-pandas` can directly convert into a GeoArrow extension type.fixEnsure your input data is a list of `shapely` geometries, WKB/WKT strings, or a compatible PyArrow array. Explicitly specify the GeoArrow dtype, e.g., `pd.Series([...], dtype=ga.wkb_type())` or `pd.Series([...], dtype=ga.point_type(pyarrow.float64()))`. -
AttributeError: 'GeoArrowExtensionArray' object has no attribute 'area'
cause You are attempting to call a geometry-specific method (like `.area`, `.centroid`, etc.) directly on the `GeoArrowExtensionArray` or its accessor, expecting GeoPandas-like behavior.fixGeoArrow Pandas provides a memory-efficient backbone, but does not replicate all GeoPandas geometry methods directly. Convert your data to a `geopandas.GeoDataFrame` or `GeoSeries` first using `ga.to_geopandas()` if you need these methods, or use lower-level functions from the `geoarrow` library itself for spatial operations. -
ModuleNotFoundError: No module named 'geoarrow.pandas'
cause The `geoarrow-pandas` library has not been installed in your current Python environment.fixInstall the library using pip: `pip install geoarrow-pandas`.
Warnings
- breaking As a '0.x' experimental library, the API of geoarrow-pandas is subject to change without strict backward compatibility guarantees. Expect potential modifications to class names, function signatures, and overall structure in future releases.
- gotcha Direct conversion of large GeoArrow-backed DataFrames or Series to GeoPandas (e.g., via `ga.to_geopandas`) can be computationally intensive and memory-hungry, as it often involves copying data and converting it to shapely objects.
- gotcha GeoArrow-backed Series (GeoArrowExtensionDtype) do not expose the same comprehensive set of geometry-specific methods (e.g., `.area`, `.centroid`, `.crs`) directly on the Series accessor as a `geopandas.GeoSeries` does.
- gotcha Attempting to create a GeoArrowExtensionDtype Series with incompatible Python objects or incorrect `pyarrow` types can lead to `TypeError` or `ValueError`.
Install
-
pip install geoarrow-pandas
Imports
- geoarrow.pandas
import geoarrow.pandas as ga
- GeoArrowExtensionDtype
from geoarrow.pandas.types import GeoArrowExtensionDtype
from geoarrow.pandas import GeoArrowExtensionDtype
Quickstart
import pandas as pd
import geoarrow.pandas as ga
import shapely
# Create a pandas Series with GeoArrow-backed WKB geometries
s = pd.Series([shapely.Point(1, 2), shapely.Point(3, 4)], dtype=ga.wkb_type())
print("GeoArrow-backed Series:")
print(s)
print(f"Series dtype: {s.dtype}")
# Example conversion to GeoPandas
# geopandas dependency would be needed for actual usage:
# try:
# import geopandas
# gdf = ga.to_geopandas(s.to_frame(name='geometry'))
# print('\nConverted to GeoPandas GeoDataFrame:')
# print(gdf)
# except ImportError:
# print('\nSkipping GeoPandas conversion (geopandas not installed).')