Azure Synapse Spark Client Library

0.7.0 · active · verified Thu Apr 09

The Azure Synapse Spark Client Library for Python provides capabilities to interact with Azure Synapse Analytics Spark pools. It allows for submitting Spark batch jobs, managing Spark sessions, and interacting with Livy endpoints programmatically. As of version 0.7.0, it is part of the larger Azure SDK for Python ecosystem and typically sees updates aligned with new Synapse service features or general Azure SDK releases.

Warnings

breaking As a library in an early preview version (0.x.x), `azure-synapse-spark` may introduce breaking changes in minor version updates. Always review release notes when upgrading.
Fix: Consult the official Azure SDK for Python changelog for `azure-synapse-spark` before upgrading to new preview versions.
gotcha Many client methods, like `get_spark_batch_jobs`, require explicit `workspace_name` and `spark_pool_name` arguments, even if the workspace name is implicitly part of the client's `endpoint` URL. Ensure these parameters are consistently provided.
Fix: Always check the method signatures and examples. Pass `workspace_name` and `spark_pool_name` explicitly where required by the method, typically derived from environment variables or configuration.
gotcha Authentication with Azure services using `DefaultAzureCredential` relies on specific environment variables (e.g., `AZURE_CLIENT_ID`, `AZURE_CLIENT_SECRET`, `AZURE_TENANT_ID`), being logged in via Azure CLI (`az login`), or other Azure Identity sources. Without proper setup, authentication will fail.
Fix: Ensure `azure-identity` is installed (`pip install azure-identity`) and that your environment is correctly configured for `DefaultAzureCredential` to find credentials (e.g., `az login` or appropriate environment variables).

Install

pip install azure-synapse-spark azure-identity Install core library and Azure Identity for authentication

Imports

SparkClient
```
from azure.synapse.spark import SparkClient
```
This is the primary client for interacting with Synapse Spark.
DefaultAzureCredential
```
from azure.identity import DefaultAzureCredential
```
Recommended credential for authenticating with Azure services.

Quickstart

This quickstart demonstrates how to authenticate with Azure Synapse and list Spark batch jobs within a specified Spark pool. You'll need to set `SYNAPSE_WORKSPACE_NAME` and `SYNAPSE_SPARK_POOL_NAME` environment variables (or hardcode them) and ensure your Azure credentials are configured (e.g., via `az login`).

import os
from azure.identity import DefaultAzureCredential
from azure.synapse.spark import SparkClient

# Replace with your Synapse workspace name and a Spark pool name
synapse_workspace_name = os.environ.get("SYNAPSE_WORKSPACE_NAME", "your_synapse_workspace_name")
spark_pool_name = os.environ.get("SYNAPSE_SPARK_POOL_NAME", "your_spark_pool_name")
endpoint = f"https://{synapse_workspace_name}.dev.azuresynapse.net"

if synapse_workspace_name == "your_synapse_workspace_name" or spark_pool_name == "your_spark_pool_name":
    print("Please set SYNAPSE_WORKSPACE_NAME and SYNAPSE_SPARK_POOL_NAME environment variables ",
          "or replace the placeholder values in the code.")
else:
    try:
        # Obtain a credential from Azure Identity. Ensure you're logged in via Azure CLI/VS Code, or env vars are set.
        credential = DefaultAzureCredential()

        # Create a SparkClient
        spark_client = SparkClient(endpoint=endpoint, credential=credential)

        # List Spark batch jobs in a specific pool (example operation)
        print(f"Listing Spark batch jobs for Spark Pool '{spark_pool_name}' in workspace '{synapse_workspace_name}'...")
        batch_jobs_collection = spark_client.spark_batch.get_spark_batch_jobs(
            workspace_name=synapse_workspace_name, 
            spark_pool_name=spark_pool_name
        )

        print(f"Found {len(batch_jobs_collection.value)} Spark batch jobs:")
        for job in batch_jobs_collection.value:
            print(f"  - Job ID: {job.id}, Name: {job.name}, State: {job.state}")

    except Exception as e:
        print(f"Error interacting with Azure Synapse Spark: {e}")
        print("Ensure your Azure credentials are set up and you have permissions to the Synapse workspace and Spark pool.")

view raw JSON →