Azure ML Pipeline
The `azureml-pipeline` library is part of the Azure Machine Learning V1 Python SDK, used to build, optimize, and manage complex machine learning workflows as pipelines. It enables users to define sequences of steps, manage data dependencies, and run these pipelines on various Azure compute targets. The current version is 1.62.0, and it generally follows the release cadence of the broader Azure ML V1 SDK, with updates typically occurring monthly or bi-monthly.
Common errors
-
ModuleNotFoundError: No module named 'azureml.pipeline'
cause The `azureml-pipeline` library is not installed in the current Python environment.fixInstall the package: `pip install azureml-pipeline` -
ModuleNotFoundError: No module named 'azureml.core'
cause The core `azureml-core` library, which provides essential components like `Workspace`, is not installed.fixInstall the core package: `pip install azureml-core`. It's often best to install both: `pip install azureml-pipeline azureml-core`. -
AttributeError: 'MLClient' object has no attribute 'create_pipeline'
cause Attempting to use V2 SDK (`azure.ai.ml`) constructs (like `MLClient`) with V1 SDK (`azureml-pipeline`) pipeline definitions. `MLClient` is from the V2 SDK, while `Pipeline` is from V1.fixIf using `azureml-pipeline` (V1 SDK), use `azureml.core.Workspace` and `azureml.pipeline.core.Pipeline` methods. If using V2, import `MLClient` and build pipelines using V2 constructs, avoiding `azureml-pipeline`. -
azureml.core.authentication.AuthenticationException: Authentication failed.
cause The Python environment does not have correct credentials or configuration to connect to your Azure ML Workspace.fixEnsure `config.json` is in the working directory, or set `AZURE_SUBSCRIPTION_ID`, `AZURE_RESOURCE_GROUP`, `AZURE_WORKSPACE_NAME` environment variables. Alternatively, use interactive login with `ws = Workspace.from_config(auth=InteractiveLoginAuthentication())`. -
azureml.core.compute.compute.ComputeTargetException: Compute target 'my-compute' not found.
cause The specified compute target does not exist or is misspelled in the Azure ML Workspace.fixVerify the name of your compute target in the Azure ML studio portal and ensure it's correctly passed to `ComputeTarget(workspace=ws, name='my-compute')`.
Warnings
- breaking `azureml-pipeline` is part of the Azure ML V1 SDK. Microsoft's recommended SDK is V2 (`azure.ai.ml`). Mixing V1 and V2 objects or concepts will lead to runtime errors (e.g., `AttributeError`, `TypeError`).
- gotcha The `azureml-pipeline` library requires `azureml-core` for fundamental functionalities like `Workspace`, `ComputeTarget`, and `Experiment`. Installing only `azureml-pipeline` will result in `ModuleNotFoundError` for these core classes.
- gotcha `azureml-pipeline` has specific Python version requirements. Current versions (1.62.0) support Python `3.8` and `3.9`. Using unsupported Python versions (e.g., Python 3.10+) will lead to installation failures or runtime errors.
- gotcha Pipeline steps rely on Azure ML compute targets and datastores. Errors can occur if the specified compute target does not exist, is not correctly configured, or lacks necessary permissions.
Install
-
pip install azureml-pipeline azureml-core
Imports
- Workspace
from azureml.core import Workspace
- Experiment
from azureml.core import Experiment
- ComputeTarget
from azureml.core.compute import ComputeTarget, AmlCompute
- Pipeline
from azureml.pipeline.core import Pipeline
- PythonScriptStep
from azureml.pipeline.steps import PythonScriptStep
- PipelineData
from azureml.pipeline.core import PipelineData
Quickstart
import os
from azureml.core import Workspace, Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.pipeline.core import Pipeline
from azureml.pipeline.steps import PythonScriptStep
# --- 1. Get Azure ML Workspace ---
# Authenticate via config.json or environment variables
try:
ws = Workspace.from_config()
print(f"Workspace loaded from config: {ws.name}")
except Exception:
print("config.json not found or failed, trying environment variables...")
# Replace with your actual subscription_id, resource_group, workspace_name
subscription_id = os.environ.get("AZURE_SUBSCRIPTION_ID", "<YOUR_SUBSCRIPTION_ID>")
resource_group = os.environ.get("AZURE_RESOURCE_GROUP", "<YOUR_RESOURCE_GROUP>")
workspace_name = os.environ.get("AZURE_WORKSPACE_NAME", "<YOUR_WORKSPACE_NAME>")
if "<YOUR_SUBSCRIPTION_ID>" in subscription_id: # Check if placeholders are still present
raise ValueError("Please set AZURE_SUBSCRIPTION_ID, AZURE_RESOURCE_GROUP, AZURE_WORKSPACE_NAME or provide config.json")
ws = Workspace(subscription_id, resource_group, workspace_name)
print(f"Workspace loaded from environment: {ws.name}")
# --- 2. Define Compute Target ---
cpu_cluster_name = "cpu-cluster-qs" # Name for your compute cluster
try:
cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
print(f"Found existing compute target: {cpu_cluster_name}")
except Exception:
print(f"Creating a new compute target: {cpu_cluster_name}")
compute_config = AmlCompute.provisioning_configuration(
vm_size="STANDARD_DS3_V2",
min_nodes=0,
max_nodes=1
)
cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)
cpu_cluster.wait_for_completion(show_output=True)
# --- 3. Create a Python script for a pipeline step ---
script_name = "my_pipeline_step_script.py"
with open(script_name, "w") as f:
f.write("import os; print(f'Hello from Azure ML Pipeline step on {os.uname().nodename}')")
# --- 4. Define a Pipeline Step ---
step = PythonScriptStep(
name="HelloStep",
script_name=script_name,
compute_target=cpu_cluster,
source_directory=".", # The directory containing the script
allow_reuse=True # Allows reuse of previous step runs if inputs/parameters are identical
)
# --- 5. Create and Submit the Pipeline ---
pipeline = Pipeline(workspace=ws, steps=[step])
print("Submitting pipeline...")
pipeline_run = Experiment(ws, 'MyFirstPipelineExperiment').submit(pipeline)
print(f"Pipeline submitted. Run ID: {pipeline_run.id}")
# Uncomment the line below to wait for the pipeline run to complete
# pipeline_run.wait_for_completion(show_output=True)