{"library":"pyspark","title":"PySpark","description":"PySpark is the Python API for Apache Spark, a unified analytics engine for large-scale data processing. It allows users to leverage Spark's powerful distributed computing capabilities, including Spark SQL, DataFrames, Structured Streaming, and MLlib, using familiar Python syntax. The library is actively maintained, with the current version being 4.1.1, and follows the release cadence of the broader Apache Spark project.","language":"python","status":"active","last_verified":"Tue May 19","install":{"commands":["pip install pyspark","pip install pyspark[sql] pyspark[pandas_on_spark] pyspark[connect]"],"cli":{"name":"pyspark","version":"JAVA_HOME is not set. Would you like to install JDK 17 and set JAVA_HOME? (Y/N) JDK installation skipped. You can manually install JDK (17 or later) and set JAVA_HOME in your environment."}},"imports":["from pyspark.sql import SparkSession","from pyspark.sql import functions as F","from pyspark.sql import Row","from pyspark.sql.types import StructType, StructField, StringType, IntegerType"],"auth":{"required":false,"env_vars":[]},"quickstart":{"code":"import os\nfrom pyspark.sql import SparkSession\nfrom pyspark.sql.functions import col\n\n# PySpark requires JAVA_HOME to be set. Ensure it points to your JDK installation.\n# For example: os.environ['JAVA_HOME'] = '/usr/lib/jvm/java-11-openjdk-amd64'\n\n# Create a SparkSession - the entry point to Spark functionality\nspark = SparkSession.builder \\\n    .appName(\"PySparkQuickstart\") \\\n    .getOrCreate()\n\n# Create a simple DataFrame\ndata = [(\"Alice\", 1), (\"Bob\", 2), (\"Charlie\", 3), (\"David\", 1)]\ncolumns = [\"name\", \"value\"]\ndf = spark.createDataFrame(data, columns)\n\n# Show the DataFrame schema and data\ndf.printSchema()\ndf.show()\n\n# Perform a simple transformation (filter) and action (show)\nfiltered_df = df.filter(col(\"value\") > 1)\nprint(\"Filtered DataFrame:\")\nfiltered_df.show()\n\n# Group by 'value' and count occurrences\ngrouped_df = df.groupBy(\"value\").count()\nprint(\"Grouped DataFrame:\")\ngrouped_df.show()\n\n# Stop the SparkSession\nspark.stop()\n","lang":"python","description":"This quickstart demonstrates how to initialize a SparkSession, create a DataFrame from Python data, display its schema and content, perform a filtering transformation, and then a grouping and aggregation. It also highlights the importance of setting `JAVA_HOME`.","tag":"stale","tag_description":"widespread failures or data too old to trust","last_tested":"2026-04-23","results":[{"runtime":"python:3.10-alpine","exit_code":-1},{"runtime":"python:3.10-slim","exit_code":-1},{"runtime":"python:3.11-alpine","exit_code":-1},{"runtime":"python:3.11-slim","exit_code":-1},{"runtime":"python:3.12-alpine","exit_code":-1},{"runtime":"python:3.12-slim","exit_code":-1},{"runtime":"python:3.13-alpine","exit_code":-1},{"runtime":"python:3.13-slim","exit_code":-1},{"runtime":"python:3.9-alpine","exit_code":-1},{"runtime":"python:3.9-slim","exit_code":-1}]},"compatibility":{"tag":"verified","tag_description":"installs cleanly on critical runtimes, fast import, recently tested","last_tested":"2026-05-19","installed_version":"4.0.2","pypi_latest":"4.1.1","is_stale":true,"summary":{"python_range":"3.10–3.9","success_rate":100,"avg_install_s":24.7,"avg_import_s":0.73,"wheel_type":"sdist"},"results":[{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"pyspark","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":null,"import_time_s":0.39,"mem_mb":12.8,"disk_size":"505.1M"},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"pyspark","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":0.67,"mem_mb":12.8,"disk_size":"505.1M"},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"sql","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":0.1,"import_time_s":0.62,"mem_mb":17.8,"disk_size":"865.2M"},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"sql","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":0.93,"mem_mb":17.8,"disk_size":"859.9M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"pyspark","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":30.9,"import_time_s":0.25,"mem_mb":12.8,"disk_size":"506M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"pyspark","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":0.4,"mem_mb":12.8,"disk_size":"506M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"sql","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":43.2,"import_time_s":0.51,"mem_mb":17.8,"disk_size":"839M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"sql","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":0.69,"mem_mb":17.8,"disk_size":"835M"},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"pyspark","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":null,"import_time_s":0.81,"mem_mb":13.9,"disk_size":"511.1M"},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"pyspark","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":1.02,"mem_mb":13.9,"disk_size":"511.0M"},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"sql","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":0.1,"import_time_s":1.09,"mem_mb":19.2,"disk_size":"885.0M"},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"sql","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":1.41,"mem_mb":19.2,"disk_size":"879.6M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"pyspark","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":30,"import_time_s":0.52,"mem_mb":13.9,"disk_size":"512M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"pyspark","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":0.9,"mem_mb":13.9,"disk_size":"512M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"sql","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":41.2,"import_time_s":0.85,"mem_mb":19.2,"disk_size":"858M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"sql","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":0.8,"mem_mb":19.1,"disk_size":"854M"},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"pyspark","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":null,"import_time_s":0.51,"mem_mb":13.9,"disk_size":"500.0M"},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"pyspark","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":0.91,"mem_mb":13.9,"disk_size":"500.0M"},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"sql","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":0.1,"import_time_s":0.86,"mem_mb":19.2,"disk_size":"867.0M"},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"sql","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":1.28,"mem_mb":19.2,"disk_size":"861.6M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"pyspark","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":36.5,"import_time_s":0.5,"mem_mb":13.9,"disk_size":"501M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"pyspark","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":0.78,"mem_mb":13.9,"disk_size":"501M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"sql","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":40.5,"import_time_s":0.8,"mem_mb":19.2,"disk_size":"840M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"sql","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":1.13,"mem_mb":19.2,"disk_size":"836M"},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"pyspark","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":null,"import_time_s":0.5,"mem_mb":14.3,"disk_size":"499.3M"},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"pyspark","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":0.75,"mem_mb":14.3,"disk_size":"499.2M"},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"sql","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":0.1,"import_time_s":0.69,"mem_mb":19.1,"disk_size":"865.5M"},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"sql","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":0.9,"mem_mb":19.1,"disk_size":"860.0M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"pyspark","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":31.4,"import_time_s":0.5,"mem_mb":14.3,"disk_size":"500M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"pyspark","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":0.8,"mem_mb":14.3,"disk_size":"500M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"sql","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":41,"import_time_s":0.74,"mem_mb":19.1,"disk_size":"839M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"sql","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":1.23,"mem_mb":19.1,"disk_size":"834M"},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"pyspark","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":null,"import_time_s":0.38,"mem_mb":12.1,"disk_size":"483.4M"},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"pyspark","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":0.57,"mem_mb":12.1,"disk_size":"483.4M"},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"sql","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":0.1,"import_time_s":0.59,"mem_mb":17.3,"disk_size":"818.4M"},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"sql","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":0.86,"mem_mb":17.3,"disk_size":"818.4M"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"pyspark","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":32.2,"import_time_s":0.31,"mem_mb":12.1,"disk_size":"484M"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"pyspark","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":0.39,"mem_mb":12.1,"disk_size":"484M"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"sql","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":43.1,"import_time_s":0.55,"mem_mb":17.3,"disk_size":"793M"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"sql","exit_code":0,"wheel_type":null,"failure_reason":null,"import_side_effects":null,"install_time_s":null,"import_time_s":0.91,"mem_mb":17.3,"disk_size":"793M"}]}}