Delta Lake Python

1.5.0 verified Tue May 12 auth: no python install: verified

Deltalake is an open-source Python library providing native Delta Lake bindings based on the `delta-rs` Rust library, offering efficient and robust interaction with Delta Lake tables without requiring Apache Spark or JVM dependencies. It includes seamless integration with data manipulation libraries like Pandas, Polars, and PyArrow. The library is actively developed, with its current version being 1.5.0, and receives frequent updates to enhance performance and features.

pip install deltalake pandas pyarrow

Common errors

error ModuleNotFoundError: No module named 'deltalake' ↓

cause The 'deltalake' package is not installed in your Python environment or is not accessible on the Python path.

fix

Install the 'deltalake' library using pip: pip install deltalake

error ValueError: you must provide schema if data is iterable ↓

cause When writing data to a new Delta Lake table using `write_deltalake` with an iterable (like a Pandas DataFrame, which is internally converted to an Arrow table) and the table does not exist or schema inference fails, an explicit schema might be required.

fix

Ensure the DataFrame or PyArrow table has a well-defined schema, or explicitly provide a schema parameter to write_deltalake if creating a new table from an iterable source that can't be reliably inferred. For pandas DataFrames, ensure data types are consistent. For example, write_deltalake(path, df, mode='overwrite') will typically infer the schema correctly for a DataFrame, but if issues arise, converting to PyArrow Table first or defining schema explicitly helps.

error ValueError: Schema of data does not match the existing table's schema. ↓

cause Attempting to write data to an existing Delta Lake table where the schema of the incoming data differs from the table's current schema without explicitly handling schema evolution.

fix

To allow schema changes, use mode='overwrite' with overwrite_schema=True for a complete schema replacement, or schema_mode='merge' (or mode='append', schema_mode='merge') to append new columns and fill missing ones with nulls during an append operation: write_deltalake(path, data, mode='append', schema_mode='merge') or write_deltalake(path, data, mode='overwrite', overwrite_schema=True).

error DeltaProtocolError: The table has set these reader features: {'deletionVectors'} but these are not yet supported by the deltalake reader. ↓

cause The Delta Lake table being read utilizes advanced features (like deletion vectors or column mapping) that are not yet fully supported by the `deltalake` Python library (which is based on `delta-rs`).

fix

Either upgrade to a newer version of deltalake if support has been added, disable the unsupported table features if possible (which might require re-creating the table in a compatible way), or use an alternative reader that supports these features (e.g., DuckDB with its Delta Lake extension, or Spark if available).

error AttributeError: type object 'deltalake._internal.Schema' has no attribute 'to_pyarrow'. Did you mean: 'to_arrow'? ↓

cause This error typically occurs due to breaking API changes in `deltalake` version 1.0 or later, where `to_pyarrow()` was renamed to `to_arrow()` or other schema-related methods were updated, leading to incompatibility with older code or dependent libraries like Polars.

fix

Update your code to use the new API, replacing to_pyarrow() with to_arrow(). If using dependent libraries, ensure they are compatible with your deltalake version, or pin deltalake to a version compatible with your dependencies.

Warnings

breaking In `deltalake` v1.5.0, the `get_add_actions` method now returns an `ArrowTable` instead of an `ArrowRecordBatch`. Code relying on the specific `ArrowRecordBatch` type or its API will break. ↓

fix Update code to expect and handle `pyarrow.Table` objects from `get_add_actions`. Adjust API calls accordingly (e.g., `to_pandas()` might still work on `ArrowTable`).

breaking Checkpoint schema changes between `deltalake` versions, notably around `0.25.5` and `1.0.2`, can lead to `DeltaError: Failed to parse parquet: Arrow: Incompatible type` when attempting to read or create checkpoints from older tables, especially if `nullable` properties for fields like `path`, `size`, `modificationTime` changed from `True` to `False`. ↓

fix There is currently no direct migration path provided in the library for this specific checkpoint schema change. Users might need to recreate tables or use the version of `deltalake` that created the original checkpoints for continued compatibility.

gotcha The `deltalake` Python library is a native implementation distinct from `delta-spark`. While both interact with Delta Lake, `deltalake` does not require Apache Spark or a JVM. Ensure you are using the correct library for your ecosystem, as `delta-spark` imports (e.g., `from delta.tables import DeltaTable`) are not compatible with `deltalake`. ↓

fix If integrating with Spark, use `delta-spark`. For Python-native operations without Spark, use `deltalake`. Avoid mixing imports or expectations from the two libraries.

gotcha Concurrent write operations (e.g., multiple processes appending or updating a table simultaneously) can lead to `ConcurrentAppendException`, `ConcurrentDeleteReadException`, or `ConcurrentModificationException` due to optimistic concurrency control. While Delta Lake guarantees ACID properties, conflicts require handling. ↓

fix Implement retry logic with exponential backoff and jitter for write operations. Consider partitioning tables strategically to minimize file-level conflicts.

gotcha Operations like `DeltaTable.delete()` or `write_deltalake(mode="overwrite")` only mark files for deletion in the Delta transaction log. The physical files are not immediately removed from storage. This can lead to increased storage costs if not managed. ↓

fix Regularly run `DeltaTable.vacuum()` on your tables to physically remove stale data files. Be aware that `vacuum()` can break time travel beyond its retention period (default 7 days).

gotcha Some functionalities, especially around `MERGE` operations, might require configuring disk spilling for large datasets to avoid out-of-memory errors. ↓

fix Utilize disk spilling configuration options available for `MERGE` operations in `deltalake` v1.5.0+ when dealing with large datasets.

gotcha When using `dt.to_pandas().to_markdown()` to display table data, `pandas` requires the optional `tabulate` library to be installed. Without it, an `ImportError: Missing optional dependency 'tabulate'` will occur. ↓

fix Ensure the `tabulate` library is installed in your environment (e.g., `pip install tabulate`) if you intend to use `pandas.DataFrame.to_markdown()`.

gotcha When converting Delta tables to pandas DataFrames and then attempting to use display methods like `to_markdown()`, `to_latex()`, or `to_html()`, you might encounter `ImportError` for packages like `tabulate`, `jinja2`, or `xhtml2pdf`. These are optional dependencies for pandas display functionalities, not direct dependencies of `deltalake`. ↓

fix Ensure all necessary optional dependencies for pandas display methods are installed in your environment (e.g., `pip install tabulate`, `pip install jinja2`). Check pandas documentation for specific requirements of each display function.

Install compatibility verified last tested: 2026-05-12

python os / libc status wheel install import disk

3.10 alpine (musl) wheel - 0.11s 456.8M

3.10 alpine (musl) - - 0.08s 443.0M

3.10 slim (glibc) wheel 11.3s 0.06s 425M

3.10 slim (glibc) - - 0.06s 412M

3.11 alpine (musl) wheel - 0.15s 471.8M

3.11 alpine (musl) - - 0.18s 457.9M

3.11 slim (glibc) wheel 10.5s 0.13s 439M

3.11 slim (glibc) - - 0.12s 426M

3.12 alpine (musl) wheel - 0.13s 456.4M

3.12 alpine (musl) - - 0.14s 442.5M

3.12 slim (glibc) wheel 10.5s 0.11s 424M

3.12 slim (glibc) - - 0.12s 411M

3.13 alpine (musl) wheel - 0.12s 455.4M

3.13 alpine (musl) - - 0.11s 441.4M

3.13 slim (glibc) wheel 10.6s 0.10s 423M

3.13 slim (glibc) - - 0.11s 410M

3.9 alpine (musl) wheel - 0.13s 479.3M

3.9 alpine (musl) - - 0.14s 479.3M

3.9 slim (glibc) wheel 13.3s 0.11s 494M

3.9 slim (glibc) - - 0.11s 494M

Imports

DeltaTable
```
from deltalake import DeltaTable
```
write_deltalake
```
from deltalake import write_deltalake
```

Quickstart last tested: 2026-04-24

This quickstart demonstrates how to create, append data to, and read different versions (time travel) of a Delta Lake table using `deltalake` and Pandas. It first creates an initial table, then appends new records, and finally shows how to access a previous state of the table by specifying a version.

import pandas as pd
from deltalake import write_deltalake, DeltaTable
import os

# Define a Delta Lake table path
table_path = "./tmp_delta_table"

# Ensure the directory exists or is cleaned up for a fresh start
if os.path.exists(table_path):
    import shutil
    shutil.rmtree(table_path)

# 1. Create a Pandas DataFrame
df = pd.DataFrame({"id": [1, 2], "value": ["A", "B"]})

# 2. Write the DataFrame to a Delta Lake table
write_deltalake(table_path, df)
print(f"Initial Delta table created at: {table_path}")

# 3. Load the Delta table
dt = DeltaTable(table_path)
print(f"Current table version: {dt.version()}")
print("Current table data:")
print(dt.to_pandas().to_markdown(index=False))

# 4. Append new data to the table
new_df = pd.DataFrame({"id": [3, 4], "value": ["C", "D"]})
write_deltalake(table_path, new_df, mode="append")
print("\nData appended. New table version:")
dt_updated = DeltaTable(table_path)
print(f"Current table version: {dt_updated.version()}")
print("Updated table data:")
print(dt_updated.to_pandas().to_markdown(index=False))

# 5. Read an older version of the table (Time Travel)
dt_v0 = DeltaTable(table_path, version=0)
print("\nData from version 0 (time travel):")
print(dt_v0.to_pandas().to_markdown(index=False))

# Clean up temporary files (optional)
# shutil.rmtree(table_path)