GeoArrow Types
The `geoarrow-types` package defines Python types for the GeoArrow specification, primarily as PyArrow extension types. It provides a foundational layer for representing geospatial data in Apache Arrow columnar format, enabling interoperability and high-performance geospatial data processing. The current version is 0.3.0, and it follows a minor release cadence aligned with the broader GeoArrow Python ecosystem.
Common errors
-
pyarrow.lib.ArrowKeyError: 'geoarrow.point' not found in type factory
cause The GeoArrow extension type 'geoarrow.point' was not registered with PyArrow before being used to create an array.fixCall `PointExtensionType.register_extension_type()` before using the string type name. For example: `from geoarrow_types import PointExtensionType; PointExtensionType.register_extension_type(); pa.array([...], type='geoarrow.point')`. -
pyarrow.lib.ArrowInvalid: Expected a storage array of type Struct<x: double, y: double> but got List<item: string>
cause The PyArrow array provided as storage for an `ExtensionArray` does not match the expected underlying data type for the specific GeoArrow extension type.fixEnsure the storage array's PyArrow type matches the `_storage_type` of the GeoArrow extension. For `PointExtensionType`, this is typically `pa.struct([pa.field('x', pa.float64()), pa.field('y', pa.float64())])`.
Warnings
- gotcha GeoArrow extension types must be explicitly registered with PyArrow before they can be used by their string identifier (e.g., 'geoarrow.point') when creating arrays. Forgetting this will lead to errors.
- breaking Version 0.3.0 explicitly requires `pyarrow>=10.0.0`. Older versions of PyArrow (e.g., 9.x or earlier) are not compatible and will likely cause installation or runtime issues.
- gotcha When manually constructing `pyarrow.ExtensionArray` from a storage array, the storage array's PyArrow data type must precisely match the `_storage_type` defined by the GeoArrow Extension Type. Mismatches will raise `pyarrow.lib.ArrowInvalid`.
Install
-
pip install geoarrow-types
Imports
- PointExtensionType
from geoarrow_types import PointExtensionType
- LineStringExtensionType
from geoarrow_types import LineStringExtensionType
- PolygonExtensionType
from geoarrow_types import PolygonExtensionType
- WkbExtensionType
from geoarrow_types import WkbExtensionType
- WktExtensionType
from geoarrow_types import WktExtensionType
Quickstart
import pyarrow as pa
from geoarrow_types import PointExtensionType, WkbExtensionType
# 1. Register GeoArrow Point Type
# This step is crucial for PyArrow to recognize 'geoarrow.point' by name
PointExtensionType.register_extension_type()
# Create a PyArrow array using the registered GeoArrow point type
point_data = pa.array(
[
{'x': 1.0, 'y': 2.0},
{'x': 3.0, 'y': 4.0},
None # GeoArrow types support null values
],
type="geoarrow.point" # Use the registered string name
)
print("\nGeoArrow Point Array:")
print(point_data)
print("Is GeoArrow Point Type recognized:", isinstance(point_data.type, PointExtensionType))
# 2. Register GeoArrow WKB Type
WkbExtensionType.register_extension_type()
# Example WKB bytes (usually generated by libraries like shapely)
wkb_point_bytes = b'\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\xf0?\x00\x00\x00\x00\x00\x00\x00@'
wkb_linestring_bytes = b'\x01\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x08@\x00\x00\x00\x00\x00\x00\x10@\x00\x00\x00\x00\x00\x00\x14@\x00\x00\x00\x00\x00\x00\x18@'
wkb_data = pa.array([wkb_point_bytes, wkb_linestring_bytes, None], type="geoarrow.wkb")
print("\nGeoArrow WKB Array:")
print(wkb_data)
print("Is GeoArrow WKB Type recognized:", isinstance(wkb_data.type, WkbExtensionType))