Delta Sharing Python Connector
The Delta Sharing Python Connector is a client library that implements the Delta Sharing Protocol, enabling secure, real-time exchange of large datasets across different computing platforms without data replication. It allows users to read shared Delta Lake and Apache Parquet tables as pandas DataFrames or Apache Spark DataFrames. The current version is 1.4.1, with frequent minor releases providing continuous improvements and feature enhancements.
Warnings
- gotcha Linux users may encounter installation issues for `delta-kernel-rust-sharing-wrapper` if `glibc` version is older than 2.31 or if a pre-built Python wheel is not available for their environment.
- gotcha Delta Sharing profile files (`.share`) contain sensitive credentials (e.g., bearer tokens, OAuth client secrets). These files must be stored securely and not exposed in public repositories or insecure locations.
- gotcha When using `load_as_spark()` to read shared tables as Spark DataFrames, you must be running in a PySpark environment with the Apache Spark Connector for Delta Sharing properly configured and installed.
- gotcha Bearer tokens used for open sharing have a maximum validity of one year. Recipients must coordinate with data providers for token rotation and renewal to maintain access.
Install
-
pip install delta-sharing -
pip install delta-sharing[s3]
Imports
- SharingClient
from delta_sharing import SharingClient
- load_as_pandas
from delta_sharing import load_as_pandas
- load_as_spark
from delta_sharing import load_as_spark
- list_all_tables
client.list_all_tables()
Quickstart
import delta_sharing
import os
# Point to a Delta Sharing profile file (e.g., downloaded from a data provider)
# For a public example, you can use:
# profile_file = "https://raw.githubusercontent.com/delta-io/delta-sharing/main/examples/open-datasets.share"
# In a real scenario, this would be a local path or cloud storage path (e.g., s3://bucket/profile.share)
# Ensure your profile file (e.g., 'config.share') is accessible.
# For local testing, download from https://databricks-datasets-oregon.s3-us-west-2.amazonaws.com/delta-sharing/share/open-datasets.share
# and save it as 'open-datasets.share' in your working directory.
profile_file = os.environ.get('DELTA_SHARING_PROFILE', 'open-datasets.share')
try:
# Create a SharingClient
client = delta_sharing.SharingClient(profile_file)
# List all shared tables
print("\nAvailable Shares, Schemas, and Tables:")
tables = client.list_all_tables()
if not tables:
print("No tables found. Ensure your profile file is correct and has access.")
for table in tables:
print(f" - Share: {table.share}, Schema: {table.schema}, Table: {table.name}")
# Example: Load a specific table (replace with a table from your profile if needed)
# Using the 'COVID_19_NYT' table from the open-datasets.share example
# The format is <profile-path>#<share>.<schema>.<table>
example_table_url = f"{profile_file}#delta_sharing.default.COVID_19_NYT"
print(f"\nLoading data from: {example_table_url}")
# Load the table as a pandas DataFrame, with a limit for demonstration
df = delta_sharing.load_as_pandas(example_table_url, limit=5)
print("\nFirst 5 rows of the DataFrame:")
print(df)
except Exception as e:
print(f"An error occurred: {e}")
print("Please ensure you have a valid Delta Sharing profile file configured and accessible.")
print("You can set the DELTA_SHARING_PROFILE environment variable or download 'open-datasets.share'.")