Azure Synapse Spark Client Library
The Azure Synapse Spark Client Library for Python provides capabilities to interact with Azure Synapse Analytics Spark pools. It allows for submitting Spark batch jobs, managing Spark sessions, and interacting with Livy endpoints programmatically. As of version 0.7.0, it is part of the larger Azure SDK for Python ecosystem and typically sees updates aligned with new Synapse service features or general Azure SDK releases.
Warnings
- breaking As a library in an early preview version (0.x.x), `azure-synapse-spark` may introduce breaking changes in minor version updates. Always review release notes when upgrading.
- gotcha Many client methods, like `get_spark_batch_jobs`, require explicit `workspace_name` and `spark_pool_name` arguments, even if the workspace name is implicitly part of the client's `endpoint` URL. Ensure these parameters are consistently provided.
- gotcha Authentication with Azure services using `DefaultAzureCredential` relies on specific environment variables (e.g., `AZURE_CLIENT_ID`, `AZURE_CLIENT_SECRET`, `AZURE_TENANT_ID`), being logged in via Azure CLI (`az login`), or other Azure Identity sources. Without proper setup, authentication will fail.
Install
-
pip install azure-synapse-spark azure-identity
Imports
- SparkClient
from azure.synapse.spark import SparkClient
- DefaultAzureCredential
from azure.identity import DefaultAzureCredential
Quickstart
import os
from azure.identity import DefaultAzureCredential
from azure.synapse.spark import SparkClient
# Replace with your Synapse workspace name and a Spark pool name
synapse_workspace_name = os.environ.get("SYNAPSE_WORKSPACE_NAME", "your_synapse_workspace_name")
spark_pool_name = os.environ.get("SYNAPSE_SPARK_POOL_NAME", "your_spark_pool_name")
endpoint = f"https://{synapse_workspace_name}.dev.azuresynapse.net"
if synapse_workspace_name == "your_synapse_workspace_name" or spark_pool_name == "your_spark_pool_name":
print("Please set SYNAPSE_WORKSPACE_NAME and SYNAPSE_SPARK_POOL_NAME environment variables ",
"or replace the placeholder values in the code.")
else:
try:
# Obtain a credential from Azure Identity. Ensure you're logged in via Azure CLI/VS Code, or env vars are set.
credential = DefaultAzureCredential()
# Create a SparkClient
spark_client = SparkClient(endpoint=endpoint, credential=credential)
# List Spark batch jobs in a specific pool (example operation)
print(f"Listing Spark batch jobs for Spark Pool '{spark_pool_name}' in workspace '{synapse_workspace_name}'...")
batch_jobs_collection = spark_client.spark_batch.get_spark_batch_jobs(
workspace_name=synapse_workspace_name,
spark_pool_name=spark_pool_name
)
print(f"Found {len(batch_jobs_collection.value)} Spark batch jobs:")
for job in batch_jobs_collection.value:
print(f" - Job ID: {job.id}, Name: {job.name}, State: {job.state}")
except Exception as e:
print(f"Error interacting with Azure Synapse Spark: {e}")
print("Ensure your Azure credentials are set up and you have permissions to the Synapse workspace and Spark pool.")