Delta Kernel Rust Sharing Wrapper
This package (PyPI: `delta-kernel-rust-sharing-wrapper`) provides Python bindings to the `delta-kernel-rs` (Rust) crate, enabling Python users to read Delta Lake tables. It exports its functionality via the `delta_kernel_python` module. It serves as a foundational, low-level component, often consumed by higher-level Delta Lake Python clients. Current version is 0.3.1; releases are typically coordinated with updates to the underlying Rust kernel.
Warnings
- gotcha The PyPI package name (`delta-kernel-rust-sharing-wrapper`) differs from the Python module name (`delta_kernel_python`). Always import using `import delta_kernel_python`.
- gotcha This library has specific `pyarrow` version dependencies (e.g., `pyarrow >=10.0.1,<16` for version 0.3.1). Installing an incompatible `pyarrow` version can lead to runtime errors or crashes due to C/Rust FFI mismatches.
- breaking As this library is pre-1.0.0, its API (`delta_kernel_python` module) is subject to change without strict backward compatibility guarantees. Expect potential breaking changes in minor version updates.
- gotcha When reading tables from cloud storage (e.g., S3, ADLS Gen2), this library relies on the underlying storage client (e.g., `object_store` Rust crate) to pick up credentials. Ensure your environment variables (e.g., `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` for S3) or cloud-provider specific configurations are correctly set.
Install
-
pip install delta-kernel-rust-sharing-wrapper
Imports
- scan_table
from delta_kernel_python import scan_table
Quickstart
import delta_kernel_python
import pyarrow.dataset as ds
import os
# Path to a Delta Lake table.
# For a runnable example, ensure this path points to a valid Delta table.
# For cloud paths (s3://, abfss://, etc.), ensure appropriate AWS/Azure credentials
# are configured (e.g., via environment variables or default credential providers).
# Example: Create a test table first using `delta-rs` or `pyspark`.
table_path = os.environ.get("DELTA_TABLE_PATH", "./my_local_delta_table")
print(f"Attempting to read Delta table at: {table_path}")
try:
# Scan the Delta table to get an Arrow Dataset object.
# This object can then be used for querying and reading data.
dataset: ds.Dataset = delta_kernel_python.scan_table(table_path)
print(f"\nSchema of the Delta table:\n{dataset.schema}")
print("\nFirst 5 records from the table:")
# Read data in batches (e.g., as PyArrow RecordBatches).
count = 0
for batch in dataset.to_batches():
print(batch.to_pylist())
count += len(batch)
if count >= 5:
break
if count == 0:
print("No records found or processed.")
except Exception as e:
print(f"\nERROR: Could not read Delta table at '{table_path}'.")
print("Please ensure the path is correct and points to an existing Delta table.")
print(f"Original error: {e}")
print("Tip: You can create a simple Delta table for testing, e.g., with `delta-rs`:")
print(" import pandas as pd\n from deltalake import write_deltalake\n df = pd.DataFrame({\"id\": [1, 2], \"value\": [\"A\", \"B\"]})\n write_deltalake(\"./my_local_delta_table\", df, mode=\"overwrite\")")