Delta Kernel Rust Sharing Wrapper

0.3.1 · active · verified Sun Apr 12

This package (PyPI: `delta-kernel-rust-sharing-wrapper`) provides Python bindings to the `delta-kernel-rs` (Rust) crate, enabling Python users to read Delta Lake tables. It exports its functionality via the `delta_kernel_python` module. It serves as a foundational, low-level component, often consumed by higher-level Delta Lake Python clients. Current version is 0.3.1; releases are typically coordinated with updates to the underlying Rust kernel.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use `delta_kernel_python.scan_table` to read a Delta Lake table and process its data using PyArrow's Dataset API. It assumes a Delta table exists at `DELTA_TABLE_PATH` (or `./my_local_delta_table` by default). Cloud storage paths require correctly configured environment variables for credentials.

import delta_kernel_python
import pyarrow.dataset as ds
import os

# Path to a Delta Lake table.
# For a runnable example, ensure this path points to a valid Delta table.
# For cloud paths (s3://, abfss://, etc.), ensure appropriate AWS/Azure credentials
# are configured (e.g., via environment variables or default credential providers).
# Example: Create a test table first using `delta-rs` or `pyspark`.
table_path = os.environ.get("DELTA_TABLE_PATH", "./my_local_delta_table")

print(f"Attempting to read Delta table at: {table_path}")

try:
    # Scan the Delta table to get an Arrow Dataset object.
    # This object can then be used for querying and reading data.
    dataset: ds.Dataset = delta_kernel_python.scan_table(table_path)

    print(f"\nSchema of the Delta table:\n{dataset.schema}")

    print("\nFirst 5 records from the table:")
    # Read data in batches (e.g., as PyArrow RecordBatches).
    count = 0
    for batch in dataset.to_batches():
        print(batch.to_pylist())
        count += len(batch)
        if count >= 5:
            break
    if count == 0:
        print("No records found or processed.")

except Exception as e:
    print(f"\nERROR: Could not read Delta table at '{table_path}'.")
    print("Please ensure the path is correct and points to an existing Delta table.")
    print(f"Original error: {e}")
    print("Tip: You can create a simple Delta table for testing, e.g., with `delta-rs`:")
    print("  import pandas as pd\n  from deltalake import write_deltalake\n  df = pd.DataFrame({\"id\": [1, 2], \"value\": [\"A\", \"B\"]})\n  write_deltalake(\"./my_local_delta_table\", df, mode=\"overwrite\")")

view raw JSON →