{"id":629,"library":"delta-spark","title":"Delta Lake Python APIs for Apache Spark","description":"delta-spark provides Python APIs to interact with Delta Lake tables using Apache Spark. It enables operations like reading, writing, and time-traveling Delta tables, leveraging Spark's distributed processing capabilities. The library maintains a rapid release cadence, often releasing multiple patch and minor versions for each major iteration. The current version is 4.1.0.","status":"active","version":"4.1.0","language":"python","source_language":"en","source_url":"https://github.com/delta-io/delta","tags":["spark","delta lake","data lake","data engineering","pyspark"],"install":[{"cmd":"pip install delta-spark","lang":"bash","label":"Install delta-spark"}],"dependencies":[{"reason":"Required at runtime to interact with Apache Spark and enable Delta Lake features. delta-spark extends Spark's functionality.","package":"pyspark","optional":false}],"imports":[{"note":"Used for DML operations (update, delete, merge) and managing Delta tables.","symbol":"DeltaTable","correct":"from delta.tables import DeltaTable"},{"note":"Essential configuration to enable Delta Lake features within a SparkSession.","symbol":"SparkSession config for Delta","correct":"SparkSession.builder.config(\"spark.sql.extensions\", \"io.delta.sql.DeltaSparkSessionExtension\").config(\"spark.sql.catalog.spark_catalog\", \"org.apache.spark.sql.delta.catalog.DeltaCatalog\")"}],"quickstart":{"code":"from pyspark.sql import SparkSession\nfrom delta.tables import DeltaTable\nimport os\n\n# Configure SparkSession for Delta Lake\nspark = (\n    SparkSession.builder.appName(\"DeltaSparkQuickstart\")\n    .config(\"spark.sql.extensions\", \"io.delta.sql.DeltaSparkSessionExtension\")\n    .config(\n        \"spark.sql.catalog.spark_catalog\",\n        \"org.apache.spark.sql.delta.catalog.DeltaCatalog\",\n    )\n    .getOrCreate()\n)\n\n# Create a simple DataFrame\ndata = spark.createDataFrame([(1, \"Alice\"), (2, \"Bob\")], [\"id\", \"name\"])\n\n# Define a path for the Delta table\ndelta_table_path = os.path.join(os.getcwd(), \"tmp\", \"delta_table\")\n\n# Write data to a Delta table\nprint(f\"Writing data to Delta table at: {delta_table_path}\")\ndata.write.format(\"delta\").mode(\"overwrite\").save(delta_table_path)\n\n# Read data from the Delta table\nprint(f\"Reading data from Delta table at: {delta_table_path}\")\ndf_read = spark.read.format(\"delta\").load(delta_table_path)\ndf_read.show()\n\n# Use DeltaTable API for operations (e.g., detail)\ndelta_table = DeltaTable.forPath(spark, delta_table_path)\nprint(\"Delta table description:\")\ndelta_table.detail().show()\n\n# Stop SparkSession\nspark.stop()\n","lang":"python","description":"This quickstart demonstrates how to initialize a SparkSession with Delta Lake extensions, write a DataFrame to a Delta table, and then read the data back. It also shows how to get details about the Delta table using the `DeltaTable` API."},"warnings":[{"fix":"If using catalog-managed tables in v4.0.0, update Spark configurations and any code referencing the feature to use `catalogManaged` and `io.unitycatalog.tableId` when upgrading to v4.0.1 or later.","message":"The preview feature for catalog-managed tables was renamed from `catalogOwned-preview` to `catalogManaged` in v4.0.1. Legacy `ucTableId` also transitioned to `io.unitycatalog.tableId`.","severity":"breaking","affected_versions":"4.0.0"},{"fix":"Always consult the official Delta Lake release notes and documentation to ensure your `delta-spark` and `pyspark` versions are compatible. Upgrade both in tandem if necessary.","message":"Each `delta-spark` release is built against and optimized for specific Apache Spark versions. While some backward compatibility exists (e.g., Delta 4.1.0 supports Spark 4.1.0 and 4.0.1), major Spark version upgrades can introduce incompatibilities or require specific `delta-spark` versions.","severity":"gotcha","affected_versions":"All versions"},{"fix":"For production systems, exercise caution with 'preview' features. If using such features, be prepared for API changes and thoroughly test upgrades. Migrate to the stable naming conventions in later versions.","message":"The 'catalog-managed tables' feature introduced in v4.0.0 (preview) was explicitly stated to be in an RFC stage and 'subject to change'. Early adopters of this feature in v4.0.0 experienced breaking changes in v4.0.1.","severity":"gotcha","affected_versions":"4.0.0"},{"fix":"Ensure your Python environment is 3.10 or higher. You can check your Python version using `python --version`.","message":"Starting with version 4.x, `delta-spark` requires Python 3.10 or newer.","severity":"gotcha","affected_versions":">=4.0.0"},{"fix":"Ensure a Java Development Kit (JDK) or Java Runtime Environment (JRE) is installed and the `JAVA_HOME` environment variable is correctly set to its installation path. For example, `export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64`.","message":"PySpark requires a Java Runtime Environment (JRE) to be installed and the `JAVA_HOME` environment variable to be set, pointing to the JRE/JDK installation directory. Without it, the Java gateway cannot start, leading to `PySparkRuntimeError: [JAVA_GATEWAY_EXITED]`.","severity":"breaking","affected_versions":"All versions"},{"fix":"For minimal container environments (e.g., Alpine Linux), ensure that essential shell utilities (like `bash`) and Java Runtime Environment (JRE) are explicitly installed. For Alpine, this typically involves adding `apk add bash openjdk17-jre` (or a suitable JRE version) to your Dockerfile.","message":"PySpark's Java gateway startup process relies on common shell utilities (like `bash`, `sh`, `env`) for environment setup. In minimal Linux distributions, such as Alpine, these utilities or their symlinks may not be installed by default, leading to errors like 'env: can't execute 'bash': No such file or directory' and subsequent PySparkRuntimeError: [JAVA_GATEWAY_EXITED] failures.","severity":"breaking","affected_versions":"All versions (when running in minimal container environments like Alpine)"}],"env_vars":null,"last_verified":"2026-05-12T16:59:33.606Z","next_check":"2026-06-26T00:00:00.000Z","problems":[{"fix":"Ensure `delta-spark` is installed via `pip install delta-spark`. Additionally, your SparkSession must be configured to use Delta Lake. For local PySpark, configure the SparkSession with the appropriate Delta Lake packages and extensions:\n```python\nfrom pyspark.sql import SparkSession\nfrom delta import configure_spark_with_delta_pip\n\nbuilder = SparkSession.builder \\\n    .appName(\"DeltaSparkApp\") \\\n    .master(\"local[*]\") \\\n    .config(\"spark.sql.extensions\", \"io.delta.sql.DeltaSparkSessionExtension\") \\\n    .config(\"spark.sql.catalog.spark_catalog\", \"org.apache.spark.sql.delta.catalog.DeltaCatalog\")\n\nspark = configure_spark_with_delta_pip(builder).getOrCreate()\n# Now you can import from delta.tables\nfrom delta.tables import DeltaTable\n```\nFor production clusters (e.g., Databricks, EMR, Synapse), ensure the Delta Lake runtime/library is attached to your cluster, and avoid `pip install delta-spark` if a native version is provided, as it can cause conflicts.","cause":"This error typically occurs when the `delta-spark` Python package is not installed in the environment where your PySpark application is running, or when the Delta Lake JARs are not correctly linked with your Spark session, preventing the Python wrapper from finding the necessary Delta modules.","error":"ModuleNotFoundError: No module named 'delta'"},{"fix":"Ensure your SparkSession is configured with the correct Delta Lake packages. When submitting Spark jobs, use the `--packages` option with `spark-submit`. For example, for `delta-spark` version 4.1.0 and Spark 3.x with Scala 2.12:\n```bash\nspark-submit \\\n  --packages io.delta:delta-core_2.12:4.1.0 \\\n  --conf \"spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension\" \\\n  --conf \"spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog\" \\\n  your_script.py\n```\nIf creating a SparkSession programmatically, add these configurations:\n```python\nspark = SparkSession.builder \\\n    .appName(\"DeltaApp\") \\\n    .config(\"spark.jars.packages\", \"io.delta:delta-core_2.12:4.1.0\") \\\n    .config(\"spark.sql.extensions\", \"io.delta.sql.DeltaSparkSessionExtension\") \\\n    .config(\"spark.sql.catalog.spark_catalog\", \"org.apache.spark.sql.delta.catalog.DeltaCatalog\") \\\n    .getOrCreate()\n```","cause":"This Java exception indicates that the underlying Spark JVM cannot find the Delta Lake connector classes because the required Delta Lake JARs are not included in Spark's classpath or are not correctly configured when the SparkSession is initialized.","error":"Py4JJavaError: An error occurred while calling oXX.save. : java.lang.ClassNotFoundException: Failed to find data source: delta"},{"fix":"First, ensure the path you are providing points to an actual Delta table (a directory containing `_delta_log`). Second, verify that your SparkSession is correctly configured with the Delta Lake extensions and catalog. Refer to the fix for `java.lang.ClassNotFoundException` to ensure these configurations are in place. If using `DeltaTable.createIfNotExists()`, ensure absolute paths are used, especially in local environments.","cause":"This error occurs when Spark attempts to perform a Delta Lake-specific operation (like reading a Delta table or calling `DeltaTable.forPath()`) on a path or table that does not contain a valid Delta transaction log (`_delta_log` directory) or when the SparkSession is not properly configured to recognize Delta tables.","error":"pyspark.sql.utils.AnalysisException: `path/table` is not a Delta table."},{"fix":"In Azure Synapse Notebooks (and potentially other managed Spark environments), avoid explicitly installing `delta-spark` via `%pip install delta-spark`. Instead, the `delta.tables` module is usually available directly from the pre-configured Spark environment. Remove the `pip install delta-spark` command and directly use `from delta.tables import DeltaTable`.","cause":"This specific `ModuleNotFoundError` is commonly encountered in environments like Azure Synapse Notebooks when the `delta-spark` package is installed via `pip`. It indicates a conflict with the native Delta Lake and PySpark libraries pre-installed or integrated into such platforms, where the installed `delta-spark` package tries to import `pyspark.errors` which might not be exposed or structured in the same way by the platform's native PySpark distribution.","error":"ModuleNotFoundError: No module named 'pyspark.errors'"},{"fix":"Ensure that your `delta-spark` and PySpark versions are compatible and that the Delta Lake feature you are trying to use is supported by your runtime. For `withSchemaEvolution`, this feature became available in Databricks Runtime 16.0+. Either upgrade your `delta-spark` package and Spark runtime to a version that supports the feature, or use an alternative method for schema evolution such as setting the Spark configuration `spark.databricks.delta.schema.autoMerge.enabled` to `true` or using `.option(\"mergeSchema\", \"true\")` on DataFrame writes.","cause":"This error typically indicates a version mismatch, where you are attempting to use a feature (like `withSchemaEvolution`) that is available in a newer version of Delta Lake, but your current `delta-spark` library or the underlying Databricks Runtime/Spark environment is an older version that does not support it.","error":"AttributeError: 'DeltaMergeBuilder' object has no attribute 'withSchemaEvolution'"}],"ecosystem":"pypi","meta_description":null,"install_score":100,"install_tag":"verified","quickstart_score":0,"quickstart_tag":"stale","pypi_latest":"4.2.0","install_checks":{"last_tested":"2026-05-12","tag":"verified","tag_description":"installs cleanly on critical runtimes, fast import, recently tested","results":[{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"sdist","failure_reason":null,"install_time_s":null,"import_time_s":0.47,"mem_mb":12.9,"disk_size":"505.9M"},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":0.46,"mem_mb":12.9,"disk_size":"505.9M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"sdist","failure_reason":null,"install_time_s":31.6,"import_time_s":0.34,"mem_mb":12.9,"disk_size":"506M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":0.3,"mem_mb":12.9,"disk_size":"506M"},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"sdist","failure_reason":null,"install_time_s":null,"import_time_s":0.68,"mem_mb":14,"disk_size":"512.0M"},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":0.72,"mem_mb":14,"disk_size":"511.9M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"sdist","failure_reason":null,"install_time_s":31.2,"import_time_s":0.61,"mem_mb":14,"disk_size":"513M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":0.57,"mem_mb":14,"disk_size":"512M"},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"sdist","failure_reason":null,"install_time_s":null,"import_time_s":0.56,"mem_mb":14,"disk_size":"501.0M"},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":0.58,"mem_mb":14,"disk_size":"500.9M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"sdist","failure_reason":null,"install_time_s":32,"import_time_s":0.62,"mem_mb":14.2,"disk_size":"501M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":0.6,"mem_mb":14.2,"disk_size":"501M"},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"sdist","failure_reason":null,"install_time_s":null,"import_time_s":0.56,"mem_mb":14.5,"disk_size":"500.3M"},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":0.59,"mem_mb":14.5,"disk_size":"500.1M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"sdist","failure_reason":null,"install_time_s":30.6,"import_time_s":0.56,"mem_mb":14.5,"disk_size":"501M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":0.58,"mem_mb":14.5,"disk_size":"501M"},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"sdist","failure_reason":null,"install_time_s":null,"import_time_s":0.44,"mem_mb":12.2,"disk_size":"484.2M"},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":0.41,"mem_mb":12.2,"disk_size":"484.2M"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"sdist","failure_reason":null,"install_time_s":31.2,"import_time_s":0.39,"mem_mb":12.2,"disk_size":"485M"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":0.37,"mem_mb":12.2,"disk_size":"485M"}]},"quickstart_checks":{"last_tested":"2026-04-24","tag":"stale","tag_description":"widespread failures or data too old to trust","results":[{"runtime":"python:3.10-alpine","exit_code":-1},{"runtime":"python:3.10-slim","exit_code":-1},{"runtime":"python:3.11-alpine","exit_code":-1},{"runtime":"python:3.11-slim","exit_code":-1},{"runtime":"python:3.12-alpine","exit_code":-1},{"runtime":"python:3.12-slim","exit_code":-1},{"runtime":"python:3.13-alpine","exit_code":-1},{"runtime":"python:3.13-slim","exit_code":-1},{"runtime":"python:3.9-alpine","exit_code":-1},{"runtime":"python:3.9-slim","exit_code":-1}]}}