Azure ML Pipeline

1.62.0 · active · verified Thu Apr 16

The `azureml-pipeline` library is part of the Azure Machine Learning V1 Python SDK, used to build, optimize, and manage complex machine learning workflows as pipelines. It enables users to define sequences of steps, manage data dependencies, and run these pipelines on various Azure compute targets. The current version is 1.62.0, and it generally follows the release cadence of the broader Azure ML V1 SDK, with updates typically occurring monthly or bi-monthly.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create and submit a simple Azure ML pipeline using `azureml-pipeline`. It involves obtaining a Workspace, defining a compute target, creating a Python script for a step, and then assembling and submitting these into a pipeline. Ensure your Azure ML Workspace details are accessible either via `config.json` or environment variables for authentication.

import os
from azureml.core import Workspace, Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.pipeline.core import Pipeline
from azureml.pipeline.steps import PythonScriptStep

# --- 1. Get Azure ML Workspace ---
# Authenticate via config.json or environment variables
try:
    ws = Workspace.from_config()
    print(f"Workspace loaded from config: {ws.name}")
except Exception:
    print("config.json not found or failed, trying environment variables...")
    # Replace with your actual subscription_id, resource_group, workspace_name
    subscription_id = os.environ.get("AZURE_SUBSCRIPTION_ID", "<YOUR_SUBSCRIPTION_ID>")
    resource_group = os.environ.get("AZURE_RESOURCE_GROUP", "<YOUR_RESOURCE_GROUP>")
    workspace_name = os.environ.get("AZURE_WORKSPACE_NAME", "<YOUR_WORKSPACE_NAME>")
    if "<YOUR_SUBSCRIPTION_ID>" in subscription_id: # Check if placeholders are still present
        raise ValueError("Please set AZURE_SUBSCRIPTION_ID, AZURE_RESOURCE_GROUP, AZURE_WORKSPACE_NAME or provide config.json")
    ws = Workspace(subscription_id, resource_group, workspace_name)
    print(f"Workspace loaded from environment: {ws.name}")

# --- 2. Define Compute Target ---
cpu_cluster_name = "cpu-cluster-qs" # Name for your compute cluster
try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print(f"Found existing compute target: {cpu_cluster_name}")
except Exception:
    print(f"Creating a new compute target: {cpu_cluster_name}")
    compute_config = AmlCompute.provisioning_configuration(
        vm_size="STANDARD_DS3_V2",
        min_nodes=0,
        max_nodes=1
    )
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)
    cpu_cluster.wait_for_completion(show_output=True)

# --- 3. Create a Python script for a pipeline step ---
script_name = "my_pipeline_step_script.py"
with open(script_name, "w") as f:
    f.write("import os; print(f'Hello from Azure ML Pipeline step on {os.uname().nodename}')")

# --- 4. Define a Pipeline Step ---
step = PythonScriptStep(
    name="HelloStep",
    script_name=script_name,
    compute_target=cpu_cluster,
    source_directory=".", # The directory containing the script
    allow_reuse=True # Allows reuse of previous step runs if inputs/parameters are identical
)

# --- 5. Create and Submit the Pipeline ---
pipeline = Pipeline(workspace=ws, steps=[step])
print("Submitting pipeline...")
pipeline_run = Experiment(ws, 'MyFirstPipelineExperiment').submit(pipeline)
print(f"Pipeline submitted. Run ID: {pipeline_run.id}")
# Uncomment the line below to wait for the pipeline run to complete
# pipeline_run.wait_for_completion(show_output=True)

view raw JSON →