Delta Sharing Python Connector

1.4.1 · active · verified Sat Apr 11

The Delta Sharing Python Connector is a client library that implements the Delta Sharing Protocol, enabling secure, real-time exchange of large datasets across different computing platforms without data replication. It allows users to read shared Delta Lake and Apache Parquet tables as pandas DataFrames or Apache Spark DataFrames. The current version is 1.4.1, with frequent minor releases providing continuous improvements and feature enhancements.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize the Delta Sharing client, list available shared tables, and load a sample table into a pandas DataFrame. It assumes you have a Delta Sharing profile file (e.g., `open-datasets.share`) that provides credentials to a Delta Sharing server. For demonstration, it attempts to load a publicly available dataset.

import delta_sharing
import os

# Point to a Delta Sharing profile file (e.g., downloaded from a data provider)
# For a public example, you can use:
# profile_file = "https://raw.githubusercontent.com/delta-io/delta-sharing/main/examples/open-datasets.share"
# In a real scenario, this would be a local path or cloud storage path (e.g., s3://bucket/profile.share)
# Ensure your profile file (e.g., 'config.share') is accessible.
# For local testing, download from https://databricks-datasets-oregon.s3-us-west-2.amazonaws.com/delta-sharing/share/open-datasets.share
# and save it as 'open-datasets.share' in your working directory.

profile_file = os.environ.get('DELTA_SHARING_PROFILE', 'open-datasets.share')

try:
    # Create a SharingClient
    client = delta_sharing.SharingClient(profile_file)

    # List all shared tables
    print("\nAvailable Shares, Schemas, and Tables:")
    tables = client.list_all_tables()
    if not tables:
        print("No tables found. Ensure your profile file is correct and has access.")
    for table in tables:
        print(f"  - Share: {table.share}, Schema: {table.schema}, Table: {table.name}")

    # Example: Load a specific table (replace with a table from your profile if needed)
    # Using the 'COVID_19_NYT' table from the open-datasets.share example
    # The format is <profile-path>#<share>.<schema>.<table>
    example_table_url = f"{profile_file}#delta_sharing.default.COVID_19_NYT"
    print(f"\nLoading data from: {example_table_url}")
    
    # Load the table as a pandas DataFrame, with a limit for demonstration
    df = delta_sharing.load_as_pandas(example_table_url, limit=5)
    print("\nFirst 5 rows of the DataFrame:")
    print(df)

except Exception as e:
    print(f"An error occurred: {e}")
    print("Please ensure you have a valid Delta Sharing profile file configured and accessible.")
    print("You can set the DELTA_SHARING_PROFILE environment variable or download 'open-datasets.share'.")

view raw JSON →