Azure Synapse Spark Client Library

0.7.0 · active · verified Thu Apr 09

The Azure Synapse Spark Client Library for Python provides capabilities to interact with Azure Synapse Analytics Spark pools. It allows for submitting Spark batch jobs, managing Spark sessions, and interacting with Livy endpoints programmatically. As of version 0.7.0, it is part of the larger Azure SDK for Python ecosystem and typically sees updates aligned with new Synapse service features or general Azure SDK releases.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to authenticate with Azure Synapse and list Spark batch jobs within a specified Spark pool. You'll need to set `SYNAPSE_WORKSPACE_NAME` and `SYNAPSE_SPARK_POOL_NAME` environment variables (or hardcode them) and ensure your Azure credentials are configured (e.g., via `az login`).

import os
from azure.identity import DefaultAzureCredential
from azure.synapse.spark import SparkClient

# Replace with your Synapse workspace name and a Spark pool name
synapse_workspace_name = os.environ.get("SYNAPSE_WORKSPACE_NAME", "your_synapse_workspace_name")
spark_pool_name = os.environ.get("SYNAPSE_SPARK_POOL_NAME", "your_spark_pool_name")
endpoint = f"https://{synapse_workspace_name}.dev.azuresynapse.net"

if synapse_workspace_name == "your_synapse_workspace_name" or spark_pool_name == "your_spark_pool_name":
    print("Please set SYNAPSE_WORKSPACE_NAME and SYNAPSE_SPARK_POOL_NAME environment variables ",
          "or replace the placeholder values in the code.")
else:
    try:
        # Obtain a credential from Azure Identity. Ensure you're logged in via Azure CLI/VS Code, or env vars are set.
        credential = DefaultAzureCredential()

        # Create a SparkClient
        spark_client = SparkClient(endpoint=endpoint, credential=credential)

        # List Spark batch jobs in a specific pool (example operation)
        print(f"Listing Spark batch jobs for Spark Pool '{spark_pool_name}' in workspace '{synapse_workspace_name}'...")
        batch_jobs_collection = spark_client.spark_batch.get_spark_batch_jobs(
            workspace_name=synapse_workspace_name, 
            spark_pool_name=spark_pool_name
        )

        print(f"Found {len(batch_jobs_collection.value)} Spark batch jobs:")
        for job in batch_jobs_collection.value:
            print(f"  - Job ID: {job.id}, Name: {job.name}, State: {job.state}")

    except Exception as e:
        print(f"Error interacting with Azure Synapse Spark: {e}")
        print("Ensure your Azure credentials are set up and you have permissions to the Synapse workspace and Spark pool.")

view raw JSON →