Apache Sedona

raw JSON →
1.8.1 verified Sat Apr 25 auth: no python

Apache Sedona™ is a cluster computing system for processing large-scale spatial data, extending modern cluster computing systems like Apache Spark, Apache Flink, and Snowflake with Spatial Resilient Distributed Datasets (SRDDs), Spatial SQL, and Spatial DataFrames. It enables developers to efficiently load, process, and analyze large-scale spatial data across machines. The current stable version is 1.8.1, and the project maintains an active release cadence with multiple major and minor updates throughout the year.

pip install apache-sedona
error ModuleNotFoundError: No module named 'keplergl'
cause The 'keplergl' module is not installed, which is required for visualization features in Sedona.
fix
Install the 'keplergl' module using pip: 'pip install keplergl'.
error NameError: name 'ST_Point' is not defined
cause The 'ST_Point' function is not imported correctly from Sedona's SQL functions.
fix
Import the function directly: 'from sedona.sql.st_constructors import ST_Point'.
error User-defined types are not supported
cause Sedona's User-Defined Types (UDTs) for geometry are not compatible with Iceberg v3 native geometry types.
fix
Ensure compatibility between Sedona and Iceberg versions, or use a compatible data type.
error No module named 'sedona.sql'
cause The 'sedona.sql' module is not found, possibly due to incorrect installation or import.
fix
Verify that Apache Sedona is installed correctly and import the module as 'from sedona.sql import *'.
error ModuleNotFoundError: No module named 'sedona.spark'
cause The 'sedona.spark' module is not found, indicating that the Sedona package may not be installed or is improperly configured.
fix
Install Apache Sedona using pip: 'pip install apache-sedona'.
breaking Apache Sedona 1.8.0 and later versions dropped support for Java 8 and Apache Spark 3.3. Users must upgrade to Java 11+ and Apache Spark 3.4+ to use these versions.
fix Ensure your environment uses Java Development Kit (JDK) 11 or higher and Apache Spark 3.4 or higher. Check Sedona's official documentation for detailed compatibility matrices.
gotcha When using `apache-sedona` with Apache Spark, a `sedona-spark-shaded` (or `sedona-spark`) JAR file, compatible with your Spark and Scala versions, is required. This JAR must be either placed in `SPARK_HOME/jars/` or specified via Spark configuration (e.g., `spark.jars.packages`). Failing to include the correct JAR can lead to `NoClassDefFoundError` or `NoSuchMethodError` for spatial functions.
fix Download the appropriate `sedona-spark-shaded` JAR from Maven Central or Apache Sedona's GitHub releases matching your Spark and Scala versions. Place it in `$SPARK_HOME/jars/` or add it to your `spark.jars.packages` configuration in `SedonaContext.builder()`.
deprecated Since Apache Sedona 1.5.0, the separate `sedona-python-adapter` JAR is no longer released, as its functionality was merged into the main `sedona-spark` JAR. Using or including older `sedona-python-adapter` JARs with newer Sedona versions can lead to dependency conflicts and runtime errors.
fix Remove any references to `sedona-python-adapter` JARs from your Spark configurations or `SPARK_HOME/jars` directory when using Sedona 1.5.0 or newer. Only the `sedona-spark-shaded` JAR is typically needed.
gotcha In Apache Sedona versions 1.0.1 and earlier, the `pyspark` dependency in `setup.py` was mistakenly configured to be `< v3.1.0`. This could cause `pip` to automatically uninstall a newer `pyspark` version (e.g., 3.1.1) and install an older one (e.g., 3.0.2) upon `apache-sedona` installation, leading to version conflicts.
fix For older versions, either explicitly install `apache-sedona` without dependencies (`pip install --no-deps apache-sedona`) and then manage `pyspark` manually, or upgrade to Sedona 1.1.0 or newer where `pyspark` is an optional dependency.
pip install apache-sedona[spark]
pip install "apache-sedona[db]"
runtime variant status import time mem disk
3.10-alpine db
3.10-alpine default
3.10-alpine spark
3.10-slim db
3.10-slim default
3.10-slim spark
3.11-alpine db
3.11-alpine default
3.11-alpine spark
3.11-slim db
3.11-slim default
3.11-slim spark
3.12-alpine db
3.12-alpine default
3.12-alpine spark
3.12-slim db
3.12-slim default
3.12-slim spark
3.13-alpine db
3.13-alpine default
3.13-alpine spark
3.13-slim db
3.13-slim default
3.13-slim spark
3.9-alpine db
3.9-alpine default
3.9-alpine spark
3.9-slim db
3.9-slim default
3.9-slim spark

This quickstart demonstrates how to use Apache Sedona's single-node engine, SedonaDB, to create a spatial DataFrame, convert WKT strings to native geometry objects, and execute a spatial SQL query to find points within a specified distance. This setup provides a simple local environment for getting started without needing a full Apache Spark cluster.

import sedona.db
from shapely.geometry import Point

# 1. Connect to SedonaDB (single-node engine for local quickstart)
sd = sedona.db.connect()

# 2. Create a DataFrame with spatial data
data = [
    {"id": 1, "name": "Central Park", "geometry": Point(40.7812, -73.9665).wkt},
    {"id": 2, "name": "Empire State Building", "geometry": Point(40.7484, -73.9857).wkt},
    {"id": 3, "name": "Times Square", "geometry": Point(40.7580, -73.9855).wkt},
]
# Convert WKT strings to SedonaDB geometry objects
df = sd.create_dataframe(data).with_column("geometry", sd.st_geomfromwkt(sd.column("geometry")))

print("Original DataFrame:")
df.print_schema()
df.show()

# 3. Perform a spatial SQL query
sd.create_view("nyc_landmarks", df) # Expose DataFrame as a temporary view

# Find landmarks within a certain distance of a reference point
reference_point = Point(40.75, -73.98).wkt # A point near Midtown

result = sd.sql(
    f"""SELECT name, ST_Distance(geometry, ST_GeomFromWKT('{reference_point}')) as distance_to_ref
    FROM nyc_landmarks
    WHERE ST_DWithin(geometry, ST_GeomFromWKT('{reference_point}'), 0.05) -- 0.05 degrees approx. 5.5km
    ORDER BY distance_to_ref
    """
)

print("\nLandmarks within 0.05 degrees of the reference point:")
result.show()