{"id":2866,"library":"apache-sedona","title":"Apache Sedona","description":"Apache Sedona™ is a cluster computing system for processing large-scale spatial data, extending modern cluster computing systems like Apache Spark, Apache Flink, and Snowflake with Spatial Resilient Distributed Datasets (SRDDs), Spatial SQL, and Spatial DataFrames. It enables developers to efficiently load, process, and analyze large-scale spatial data across machines. The current stable version is 1.8.1, and the project maintains an active release cadence with multiple major and minor updates throughout the year.","status":"active","version":"1.8.1","language":"en","source_language":"en","source_url":"https://github.com/apache/sedona","tags":["geospatial","spatial-data","spark","flink","snowflake","big-data","gis","dataframe","spatial-sql","sedonadb"],"install":[{"cmd":"pip install apache-sedona","lang":"bash","label":"Core package (for SedonaDB or if PySpark is managed separately)"},{"cmd":"pip install apache-sedona[spark]","lang":"bash","label":"With PySpark dependency (for Spark integration)"},{"cmd":"pip install \"apache-sedona[db]\"","lang":"bash","label":"With SedonaDB (single-node analytical database)"}],"dependencies":[{"reason":"Required for distributed processing with Apache Spark. It's an optional dependency in `apache-sedona` itself since 1.1.0, to allow for environments where Spark is pre-installed.","package":"pyspark","optional":true},{"reason":"Used for geometry handling and operations within the Python API.","package":"shapely","optional":false},{"reason":"A Python package for classes without boilerplate, used by internal components.","package":"attrs","optional":false},{"reason":"SedonaDB (the single-node engine) is built on Apache Arrow for high-performance data processing.","package":"pyarrow","optional":true},{"reason":"SedonaDB utilizes Apache DataFusion for its query processing engine.","package":"datafusion","optional":true}],"imports":[{"note":"For initializing a Spark session with Sedona capabilities. Additional imports like ST_Contains, ST_Intersects, etc., are also typically from `sedona.spark`.","symbol":"SedonaContext","correct":"from sedona.spark import SedonaContext"},{"note":"For accessing the single-node SedonaDB engine.","symbol":"sedona.db","correct":"import sedona.db"}],"quickstart":{"code":"import sedona.db\nfrom shapely.geometry import Point\n\n# 1. Connect to SedonaDB (single-node engine for local quickstart)\nsd = sedona.db.connect()\n\n# 2. Create a DataFrame with spatial data\ndata = [\n    {\"id\": 1, \"name\": \"Central Park\", \"geometry\": Point(40.7812, -73.9665).wkt},\n    {\"id\": 2, \"name\": \"Empire State Building\", \"geometry\": Point(40.7484, -73.9857).wkt},\n    {\"id\": 3, \"name\": \"Times Square\", \"geometry\": Point(40.7580, -73.9855).wkt},\n]\n# Convert WKT strings to SedonaDB geometry objects\ndf = sd.create_dataframe(data).with_column(\"geometry\", sd.st_geomfromwkt(sd.column(\"geometry\")))\n\nprint(\"Original DataFrame:\")\ndf.print_schema()\ndf.show()\n\n# 3. Perform a spatial SQL query\nsd.create_view(\"nyc_landmarks\", df) # Expose DataFrame as a temporary view\n\n# Find landmarks within a certain distance of a reference point\nreference_point = Point(40.75, -73.98).wkt # A point near Midtown\n\nresult = sd.sql(\n    f\"\"\"SELECT name, ST_Distance(geometry, ST_GeomFromWKT('{reference_point}')) as distance_to_ref\n    FROM nyc_landmarks\n    WHERE ST_DWithin(geometry, ST_GeomFromWKT('{reference_point}'), 0.05) -- 0.05 degrees approx. 5.5km\n    ORDER BY distance_to_ref\n    \"\"\"\n)\n\nprint(\"\\nLandmarks within 0.05 degrees of the reference point:\")\nresult.show()","lang":"python","description":"This quickstart demonstrates how to use Apache Sedona's single-node engine, SedonaDB, to create a spatial DataFrame, convert WKT strings to native geometry objects, and execute a spatial SQL query to find points within a specified distance. This setup provides a simple local environment for getting started without needing a full Apache Spark cluster."},"warnings":[{"fix":"Ensure your environment uses Java Development Kit (JDK) 11 or higher and Apache Spark 3.4 or higher. Check Sedona's official documentation for detailed compatibility matrices.","message":"Apache Sedona 1.8.0 and later versions dropped support for Java 8 and Apache Spark 3.3. Users must upgrade to Java 11+ and Apache Spark 3.4+ to use these versions.","severity":"breaking","affected_versions":">=1.8.0"},{"fix":"Download the appropriate `sedona-spark-shaded` JAR from Maven Central or Apache Sedona's GitHub releases matching your Spark and Scala versions. Place it in `$SPARK_HOME/jars/` or add it to your `spark.jars.packages` configuration in `SedonaContext.builder()`.","message":"When using `apache-sedona` with Apache Spark, a `sedona-spark-shaded` (or `sedona-spark`) JAR file, compatible with your Spark and Scala versions, is required. This JAR must be either placed in `SPARK_HOME/jars/` or specified via Spark configuration (e.g., `spark.jars.packages`). Failing to include the correct JAR can lead to `NoClassDefFoundError` or `NoSuchMethodError` for spatial functions.","severity":"gotcha","affected_versions":"all"},{"fix":"Remove any references to `sedona-python-adapter` JARs from your Spark configurations or `SPARK_HOME/jars` directory when using Sedona 1.5.0 or newer. Only the `sedona-spark-shaded` JAR is typically needed.","message":"Since Apache Sedona 1.5.0, the separate `sedona-python-adapter` JAR is no longer released, as its functionality was merged into the main `sedona-spark` JAR. Using or including older `sedona-python-adapter` JARs with newer Sedona versions can lead to dependency conflicts and runtime errors.","severity":"deprecated","affected_versions":">=1.5.0"},{"fix":"For older versions, either explicitly install `apache-sedona` without dependencies (`pip install --no-deps apache-sedona`) and then manage `pyspark` manually, or upgrade to Sedona 1.1.0 or newer where `pyspark` is an optional dependency.","message":"In Apache Sedona versions 1.0.1 and earlier, the `pyspark` dependency in `setup.py` was mistakenly configured to be `< v3.1.0`. This could cause `pip` to automatically uninstall a newer `pyspark` version (e.g., 3.1.1) and install an older one (e.g., 3.0.2) upon `apache-sedona` installation, leading to version conflicts.","severity":"gotcha","affected_versions":"<=1.0.1"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}