Delta Lake Python

1.5.0 · active · verified Sun Mar 29

Deltalake is an open-source Python library providing native Delta Lake bindings based on the `delta-rs` Rust library, offering efficient and robust interaction with Delta Lake tables without requiring Apache Spark or JVM dependencies. It includes seamless integration with data manipulation libraries like Pandas, Polars, and PyArrow. The library is actively developed, with its current version being 1.5.0, and receives frequent updates to enhance performance and features.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create, append data to, and read different versions (time travel) of a Delta Lake table using `deltalake` and Pandas. It first creates an initial table, then appends new records, and finally shows how to access a previous state of the table by specifying a version.

import pandas as pd
from deltalake import write_deltalake, DeltaTable
import os

# Define a Delta Lake table path
table_path = "./tmp_delta_table"

# Ensure the directory exists or is cleaned up for a fresh start
if os.path.exists(table_path):
    import shutil
    shutil.rmtree(table_path)

# 1. Create a Pandas DataFrame
df = pd.DataFrame({"id": [1, 2], "value": ["A", "B"]})

# 2. Write the DataFrame to a Delta Lake table
write_deltalake(table_path, df)
print(f"Initial Delta table created at: {table_path}")

# 3. Load the Delta table
dt = DeltaTable(table_path)
print(f"Current table version: {dt.version()}")
print("Current table data:")
print(dt.to_pandas().to_markdown(index=False))

# 4. Append new data to the table
new_df = pd.DataFrame({"id": [3, 4], "value": ["C", "D"]})
write_deltalake(table_path, new_df, mode="append")
print("\nData appended. New table version:")
dt_updated = DeltaTable(table_path)
print(f"Current table version: {dt_updated.version()}")
print("Updated table data:")
print(dt_updated.to_pandas().to_markdown(index=False))

# 5. Read an older version of the table (Time Travel)
dt_v0 = DeltaTable(table_path, version=0)
print("\nData from version 0 (time travel):")
print(dt_v0.to_pandas().to_markdown(index=False))

# Clean up temporary files (optional)
# shutil.rmtree(table_path)

view raw JSON →