GeoArrow PyArrow

0.2.0 · active · verified Thu Apr 16

GeoArrow PyArrow is a Python library that enables working with GeoArrow data using PyArrow's efficient C++ bindings. It provides tools to serialize and deserialize geospatial data (like points, lines, polygons) into the GeoArrow format, leveraging Apache Arrow for high-performance, columnar data processing. As of version 0.2.0, it focuses on core GeoArrow type implementation and interoperability with Shapely and PyArrow. The release cadence is driven by GeoArrow specification updates and core development.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create a GeoArrow array from a list of Shapely geometries, inspect its type and contents, convert it back to Shapely, and integrate it into a PyArrow Table. It highlights the use of the `ga.array()` constructor for data conversion.

import geoarrow.pyarrow as ga
import shapely.geometry
import pyarrow as pa

# Create some Shapely Point geometries
points = [
    shapely.geometry.Point(1, 2),
    shapely.geometry.Point(3, 4),
    shapely.geometry.Point(5, 6)
]

# Convert Shapely geometries to a GeoArrow array using the unified constructor
geoarrow_array = ga.array(points)

print("GeoArrow Array Type:", geoarrow_array.type)
print("GeoArrow Array:", geoarrow_array)

# You can also convert it back to Shapely objects
shapely_roundtrip = geoarrow_array.to_shapely()
print("Shapely Roundtrip:", shapely_roundtrip)

# Integrate GeoArrow arrays into a PyArrow Table
table = pa.table({'id': [1, 2, 3], 'geometry': geoarrow_array})
print("\nPyArrow Table:")
print(table)

# It's crucial to register extension types when loading data from disk
# if the data contains GeoArrow types.
# ga.register_extension_types() # Uncomment if loading from file

view raw JSON →