PySpark Bindings for H3

1.2.6 · active · verified Thu Apr 16

h3-pyspark provides PySpark bindings for Uber's H3 hierarchical hexagonal geospatial indexing system. It allows for efficient geospatial operations and analysis directly within Spark data pipelines by exposing H3 functions as Spark UDFs and native Spark functions. The library is currently at version 1.2.6 and receives active development and maintenance, with recent releases addressing bug fixes and edge cases.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize a SparkSession, create a DataFrame with geospatial coordinates, and use `h3_pyspark.geo_to_h3` to convert latitude and longitude to an H3 index. It also includes an example of `h3_pyspark.index_shape` for indexing GeoJSON polygons. Ensure `pyspark` is configured correctly for your environment.

from pyspark.sql import SparkSession, functions as F
import h3_pyspark
import os

# Initialize Spark Session (adjust master for your environment, e.g., 'local[*]'):
spark = SparkSession.builder.master(os.environ.get('SPARK_MASTER', 'local[*]')).appName("H3PySparkQuickstart").getOrCreate()

# Create a DataFrame with latitude, longitude, and desired H3 resolution
data = [{"lat": 37.769377, "lng": -122.388903, 'resolution': 9}]
df = spark.createDataFrame(data)

# Convert geographic coordinates to H3 index
df_with_h3 = df.withColumn('h3_index', h3_pyspark.geo_to_h3(F.col('lat'), F.col('lng'), F.col('resolution')))

df_with_h3.show()

# Example of an extension function: index_shape for GeoJSON polygons
geojson_polygon = "{\"type\":\"Polygon\",\"coordinates\":[[[-122.4,37.8],[-122.3,37.8],[-122.3,37.7],[-122.4,37.7],[-122.4,37.8]]]}"
polygon_df = spark.createDataFrame([{'id': 1, 'geometry': geojson_polygon, 'resolution': 9}])

polygon_h3_df = polygon_df.withColumn(
    'h3_cells',
    h3_pyspark.index_shape(F.col('geometry'), F.col('resolution'))
)

polygon_h3_df.show(truncate=False)

spark.stop()

view raw JSON →