Apache Sedona

1.8.1 · active · verified Sat Apr 11

Apache Sedona™ is a cluster computing system for processing large-scale spatial data, extending modern cluster computing systems like Apache Spark, Apache Flink, and Snowflake with Spatial Resilient Distributed Datasets (SRDDs), Spatial SQL, and Spatial DataFrames. It enables developers to efficiently load, process, and analyze large-scale spatial data across machines. The current stable version is 1.8.1, and the project maintains an active release cadence with multiple major and minor updates throughout the year.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use Apache Sedona's single-node engine, SedonaDB, to create a spatial DataFrame, convert WKT strings to native geometry objects, and execute a spatial SQL query to find points within a specified distance. This setup provides a simple local environment for getting started without needing a full Apache Spark cluster.

import sedona.db
from shapely.geometry import Point

# 1. Connect to SedonaDB (single-node engine for local quickstart)
sd = sedona.db.connect()

# 2. Create a DataFrame with spatial data
data = [
    {"id": 1, "name": "Central Park", "geometry": Point(40.7812, -73.9665).wkt},
    {"id": 2, "name": "Empire State Building", "geometry": Point(40.7484, -73.9857).wkt},
    {"id": 3, "name": "Times Square", "geometry": Point(40.7580, -73.9855).wkt},
]
# Convert WKT strings to SedonaDB geometry objects
df = sd.create_dataframe(data).with_column("geometry", sd.st_geomfromwkt(sd.column("geometry")))

print("Original DataFrame:")
df.print_schema()
df.show()

# 3. Perform a spatial SQL query
sd.create_view("nyc_landmarks", df) # Expose DataFrame as a temporary view

# Find landmarks within a certain distance of a reference point
reference_point = Point(40.75, -73.98).wkt # A point near Midtown

result = sd.sql(
    f"""SELECT name, ST_Distance(geometry, ST_GeomFromWKT('{reference_point}')) as distance_to_ref
    FROM nyc_landmarks
    WHERE ST_DWithin(geometry, ST_GeomFromWKT('{reference_point}'), 0.05) -- 0.05 degrees approx. 5.5km
    ORDER BY distance_to_ref
    """
)

print("\nLandmarks within 0.05 degrees of the reference point:")
result.show()

view raw JSON →