GeoArrow PyArrow
GeoArrow PyArrow is a Python library that enables working with GeoArrow data using PyArrow's efficient C++ bindings. It provides tools to serialize and deserialize geospatial data (like points, lines, polygons) into the GeoArrow format, leveraging Apache Arrow for high-performance, columnar data processing. As of version 0.2.0, it focuses on core GeoArrow type implementation and interoperability with Shapely and PyArrow. The release cadence is driven by GeoArrow specification updates and core development.
Common errors
-
ModuleNotFoundError: No module named 'geoarrow'
cause The `geoarrow-pyarrow` package is not installed or not available in the current Python environment.fixInstall the package using pip: `pip install geoarrow-pyarrow` -
ImportError: cannot import name 'from_shapely' from 'geoarrow.pyarrow' (...)
cause Attempting to use the `0.1.x` API (e.g., direct `from_shapely` or `from_wkb` functions) with `geoarrow-pyarrow` version `0.2.0` or newer.fixUpdate your code to use the new unified `geoarrow.pyarrow.array()` constructor. For example, `ga.array(shapely_geometries)` or `ga.array(wkb_bytes_list)`. -
pyarrow.lib.ArrowInvalid: Could not find GeoArrow extension type for geoarrow.point.xy (or similar for other types)
cause Attempting to read a file (e.g., Parquet, Feather) containing GeoArrow extension types without registering them first in the current Python session.fixCall `geoarrow.pyarrow.register_extension_types()` at the beginning of your script, before any file read operations that might involve GeoArrow data. -
ImportError: geoarrow.pyarrow requires pyarrow >= 10.0.0, but you have 9.0.0 (or similar version mismatch)
cause The installed `pyarrow` version is older than the minimum required by `geoarrow-pyarrow`.fixUpgrade `pyarrow` to version 10.0 or higher: `pip install --upgrade pyarrow`
Warnings
- breaking The GeoArrow PyArrow API underwent significant changes in version 0.2.0. Functions like `from_shapely`, `from_wkb`, and direct module-level constructors were replaced or consolidated.
- gotcha GeoArrow itself is a data format for geometries and does not embed Coordinate Reference System (CRS) information. This is similar to how PyArrow handles non-semantic types.
- gotcha When loading data from files (e.g., Parquet, Feather) that contain GeoArrow extension types, you must register these types *before* reading the file. Otherwise, PyArrow will not recognize them and might load them as generic binary or list types.
- gotcha GeoArrow PyArrow has a strict dependency on `pyarrow >= 10.0`. Using an older version of PyArrow will likely result in `ImportError` or runtime crashes due to ABI incompatibilities.
Install
-
pip install geoarrow-pyarrow
Imports
- geoarrow.pyarrow
from geoarrow import pyarrow as ga
import geoarrow.pyarrow as ga
- geoarrow.pyarrow.types
import geoarrow.pyarrow.types as gat
Quickstart
import geoarrow.pyarrow as ga
import shapely.geometry
import pyarrow as pa
# Create some Shapely Point geometries
points = [
shapely.geometry.Point(1, 2),
shapely.geometry.Point(3, 4),
shapely.geometry.Point(5, 6)
]
# Convert Shapely geometries to a GeoArrow array using the unified constructor
geoarrow_array = ga.array(points)
print("GeoArrow Array Type:", geoarrow_array.type)
print("GeoArrow Array:", geoarrow_array)
# You can also convert it back to Shapely objects
shapely_roundtrip = geoarrow_array.to_shapely()
print("Shapely Roundtrip:", shapely_roundtrip)
# Integrate GeoArrow arrays into a PyArrow Table
table = pa.table({'id': [1, 2, 3], 'geometry': geoarrow_array})
print("\nPyArrow Table:")
print(table)
# It's crucial to register extension types when loading data from disk
# if the data contains GeoArrow types.
# ga.register_extension_types() # Uncomment if loading from file