{"id":3958,"library":"delta-sharing","title":"Delta Sharing Python Connector","description":"The Delta Sharing Python Connector is a client library that implements the Delta Sharing Protocol, enabling secure, real-time exchange of large datasets across different computing platforms without data replication. It allows users to read shared Delta Lake and Apache Parquet tables as pandas DataFrames or Apache Spark DataFrames. The current version is 1.4.1, with frequent minor releases providing continuous improvements and feature enhancements.","status":"active","version":"1.4.1","language":"en","source_language":"en","source_url":"https://github.com/delta-io/delta-sharing","tags":["data sharing","delta lake","cloud storage","dataframe","pandas","spark","etl","data lakehouse"],"install":[{"cmd":"pip install delta-sharing","lang":"bash","label":"Base Installation"},{"cmd":"pip install delta-sharing[s3]","lang":"bash","label":"With S3 Cloud Storage Support (example)"}],"dependencies":[{"reason":"Required Python version.","package":"python","version":">=3.8"},{"reason":"For loading shared tables as DataFrames.","package":"pandas","optional":false},{"reason":"Apache Arrow library for efficient columnar data handling.","package":"pyarrow","optional":false},{"reason":"Filesystem specification for abstracting local and remote storage paths, including cloud storage.","package":"fsspec","optional":false},{"reason":"Rust-based kernel for efficient data reading, a core internal dependency.","package":"delta-kernel-rust-sharing-wrapper","optional":false},{"reason":"Optional: For loading shared tables as Spark DataFrames and distributed processing. Requires Apache Spark Connector setup.","package":"pyspark","optional":true},{"reason":"HTTP client library for making API requests.","package":"requests","optional":false},{"reason":"Asynchronous HTTP client, used internally.","package":"aiohttp","optional":false},{"reason":"URL parsing library, used internally.","package":"yarl","optional":false},{"reason":"JSON Web Crypto implementation, used for OAuth/OIDC authentication.","package":"jwcrypto","optional":false}],"imports":[{"note":"The primary client for interacting with Delta Sharing servers.","symbol":"SharingClient","correct":"from delta_sharing import SharingClient"},{"note":"Function to load a shared table directly into a pandas DataFrame.","symbol":"load_as_pandas","correct":"from delta_sharing import load_as_pandas"},{"note":"Function to load a shared table directly into a PySpark DataFrame (requires PySpark environment).","symbol":"load_as_spark","correct":"from delta_sharing import load_as_spark"},{"note":"Method of SharingClient to list all available tables.","symbol":"list_all_tables","correct":"client.list_all_tables()"}],"quickstart":{"code":"import delta_sharing\nimport os\n\n# Point to a Delta Sharing profile file (e.g., downloaded from a data provider)\n# For a public example, you can use:\n# profile_file = \"https://raw.githubusercontent.com/delta-io/delta-sharing/main/examples/open-datasets.share\"\n# In a real scenario, this would be a local path or cloud storage path (e.g., s3://bucket/profile.share)\n# Ensure your profile file (e.g., 'config.share') is accessible.\n# For local testing, download from https://databricks-datasets-oregon.s3-us-west-2.amazonaws.com/delta-sharing/share/open-datasets.share\n# and save it as 'open-datasets.share' in your working directory.\n\nprofile_file = os.environ.get('DELTA_SHARING_PROFILE', 'open-datasets.share')\n\ntry:\n    # Create a SharingClient\n    client = delta_sharing.SharingClient(profile_file)\n\n    # List all shared tables\n    print(\"\\nAvailable Shares, Schemas, and Tables:\")\n    tables = client.list_all_tables()\n    if not tables:\n        print(\"No tables found. Ensure your profile file is correct and has access.\")\n    for table in tables:\n        print(f\"  - Share: {table.share}, Schema: {table.schema}, Table: {table.name}\")\n\n    # Example: Load a specific table (replace with a table from your profile if needed)\n    # Using the 'COVID_19_NYT' table from the open-datasets.share example\n    # The format is <profile-path>#<share>.<schema>.<table>\n    example_table_url = f\"{profile_file}#delta_sharing.default.COVID_19_NYT\"\n    print(f\"\\nLoading data from: {example_table_url}\")\n    \n    # Load the table as a pandas DataFrame, with a limit for demonstration\n    df = delta_sharing.load_as_pandas(example_table_url, limit=5)\n    print(\"\\nFirst 5 rows of the DataFrame:\")\n    print(df)\n\nexcept Exception as e:\n    print(f\"An error occurred: {e}\")\n    print(\"Please ensure you have a valid Delta Sharing profile file configured and accessible.\")\n    print(\"You can set the DELTA_SHARING_PROFILE environment variable or download 'open-datasets.share'.\")","lang":"python","description":"This quickstart demonstrates how to initialize the Delta Sharing client, list available shared tables, and load a sample table into a pandas DataFrame. It assumes you have a Delta Sharing profile file (e.g., `open-datasets.share`) that provides credentials to a Delta Sharing server. For demonstration, it attempts to load a publicly available dataset."},"warnings":[{"fix":"Ensure your Linux system has `glibc >= 2.31`. If installation still fails, install the Rust toolchain (e.g., via `curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh`) as `pip` might try to build the Rust component from source.","message":"Linux users may encounter installation issues for `delta-kernel-rust-sharing-wrapper` if `glibc` version is older than 2.31 or if a pre-built Python wheel is not available for their environment.","severity":"gotcha","affected_versions":">=1.1.0"},{"fix":"Store profile files in secure, access-controlled locations (e.g., local filesystem with restricted permissions, cloud storage with IAM policies, or secret management services). Avoid hardcoding credentials directly in code. For OIDC authentication, ensure `clientId` and `clientSecret` are managed securely.","message":"Delta Sharing profile files (`.share`) contain sensitive credentials (e.g., bearer tokens, OAuth client secrets). These files must be stored securely and not exposed in public repositories or insecure locations.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Refer to the official Delta Sharing documentation for setting up the Apache Spark Connector for Delta Sharing in your Spark environment, including necessary Spark package configurations (e.g., `--packages io.delta:delta-sharing-spark_2.12:<version>`).","message":"When using `load_as_spark()` to read shared tables as Spark DataFrames, you must be running in a PySpark environment with the Apache Spark Connector for Delta Sharing properly configured and installed.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Implement a process for regularly renewing and rotating bearer tokens with your data provider well before their expiration. For OIDC federation, token management is typically handled more dynamically.","message":"Bearer tokens used for open sharing have a maximum validity of one year. Recipients must coordinate with data providers for token rotation and renewal to maintain access.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}