{"id":8872,"library":"azureml-pipeline-steps","title":"Azure ML Pipeline Steps","description":"The `azureml-pipeline-steps` library, part of the Azure Machine Learning Python SDK, provides classes to define individual computational units (steps) within an Azure ML pipeline. These steps can encapsulate Python scripts, data transfers, AutoML runs, and more, enabling the construction of complex MLOps workflows. The current version is 1.62.0, and it follows the release cadence of the broader Azure ML SDK, with frequent updates.","status":"active","version":"1.62.0","language":"en","source_language":"en","source_url":"https://github.com/Azure/azureml-sdk-for-python","tags":["azureml","mlops","pipeline","cloud","machine learning"],"install":[{"cmd":"pip install azureml-pipeline-steps azureml-core","lang":"bash","label":"Install core SDK and pipeline steps"}],"dependencies":[{"reason":"Provides core Azure ML functionalities like Workspace, ComputeTarget, and Pipeline submission, which are essential for running steps.","package":"azureml-core","optional":false}],"imports":[{"note":"PythonScriptStep for pipelines moved to azureml.pipeline.steps; the train namespace is for older estimator-based training.","wrong":"from azureml.train.steps import PythonScriptStep","symbol":"PythonScriptStep","correct":"from azureml.pipeline.steps import PythonScriptStep"},{"symbol":"DataTransferStep","correct":"from azureml.pipeline.steps import DataTransferStep"}],"quickstart":{"code":"import os\nfrom azureml.core import Workspace, Environment\nfrom azureml.data.datareference import DataReference\nfrom azureml.pipeline.core import PipelineData\nfrom azureml.pipeline.steps import PythonScriptStep\n\n# NOTE: For actual execution, ensure Azure ML workspace is configured\n# using a config.json file or environment variables for service principal.\n# Example: os.environ['AZUREML_ARM_SUBSCRIPTION'] = '...'\n# workspace = Workspace.from_config()\n\n# Define a dummy script file (must exist for PythonScriptStep to be valid)\nwith open(\"process_data.py\", \"w\") as f:\n    f.write(\"import argparse\\n\")\n    f.write(\"parser = argparse.ArgumentParser()\\n\")\n    f.write(\"parser.add_argument('--input_data', type=str)\\n\")\n    f.write(\"parser.add_argument('--output_data', type=str)\\n\")\n    f.write(\"args = parser.parse_args()\\n\")\n    f.write(\"print(f'Processing data from {args.input_data} to {args.output_data}')\\n\")\n    f.write(\"with open(os.path.join(args.output_data, 'output.txt'), 'w') as out_f:\\n\")\n    f.write(\"    out_f.write('Processed data!')\\n\")\n\n# Create a simple environment (using a curated environment is recommended in production)\n# environment = Environment.from_conda_specification(\"myenv\", \"./myenv.yml\")\n# For quickstart, a basic environment suffices or assume a default compute's environment.\n\n# Define pipeline inputs and outputs\n# Using dummy placeholder for workspace and compute target for demonstration\n# In a real scenario, you'd load these from your Azure ML setup\n\n# Placeholder for Workspace and Compute\nclass MockWorkspace:\n    def __init__(self):\n        self.name = \"mock_ws\"\n        self.subscription_id = \"mock_sub_id\"\n        self.resource_group = \"mock_rg\"\n\nclass MockComputeTarget:\n    def __init__(self, name):\n        self.name = name\n\n# Use mock objects for demonstration, replace with actual objects for execution\n# workspace = Workspace.from_config() # Real workspace loading\n# compute_target = workspace.compute_targets['my-aml-compute'] # Real compute target\n\nmock_workspace = MockWorkspace()\nmock_compute = MockComputeTarget('cpu-cluster')\n\n# Define PipelineData outputs\nprocessed_data = PipelineData(\"processed_data\", datastore=mock_workspace.get_default_datastore() if hasattr(mock_workspace, 'get_default_datastore') else None)\n\n# Create a PythonScriptStep\nstep = PythonScriptStep(\n    name=\"process-data-step\",\n    script_name=\"process_data.py\",\n    arguments=[\"--input_data\", \"dummy_input_path\", \"--output_data\", processed_data],\n    inputs=[DataReference(datastore=mock_workspace.get_default_datastore() if hasattr(mock_workspace, 'get_default_datastore') else None, data_reference_name=\"dummy_input\", path_on_datastore=\"/dummy/input\")],\n    outputs=[processed_data],\n    compute_target=mock_compute.name,\n    source_directory=\".\",\n    runconfig=Environment.from_conda_specification(name='my_env', file_path='.azureml/my_env.yml').create_run_config() if os.path.exists('.azureml/my_env.yml') else None # Use an existing run config or Environment\n)\n\nprint(f\"Successfully created step: {step.name}\")\n\n# Clean up dummy script\nos.remove(\"process_data.py\")\n","lang":"python","description":"This quickstart demonstrates how to define a `PythonScriptStep`, a fundamental component of `azureml-pipeline-steps`. It shows how to link a Python script, pass arguments, specify inputs and outputs using `PipelineData` and `DataReference`, and associate it with a compute target. Note that for actual execution, you'll need to configure an Azure ML `Workspace`, a `ComputeTarget`, and a proper `Environment`."},"warnings":[{"fix":"Use `Workspace.from_config()` if `config.json` is present. Otherwise, use `Workspace.get(subscription_id='...', resource_group='...', workspace_name='...')` with an appropriate authentication object.","message":"Authentication is crucial and often a source of failure. Ensure your local environment is authenticated to Azure (e.g., via `az login`), or provide explicit credentials using `ServicePrincipalAuthentication` or `InteractiveLoginAuthentication` when instantiating `Workspace`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always use `azureml.core.Environment` objects to define environments for pipeline steps. You can register, get, or create them from Conda specifications.","message":"Older methods of defining execution environments, such as `RunConfiguration` and directly passing `CondaDependencies`, have largely been deprecated in favor of explicit `Environment` objects. Mixing old and new approaches can lead to errors.","severity":"deprecated","affected_versions":"SDK versions < 1.15.0 to current"},{"fix":"Use `PipelineData` for intermediate data outputs between steps. For initial inputs, use `DataReference` pointing to registered datasets or datastore paths. Ensure your script accesses data using the paths provided via `argparse` or environment variables.","message":"Data transfer between steps via `PipelineData` or `DataReference` requires careful path management. Incorrectly specified paths or attempting to access data before it's materialized can cause `FileNotFoundError` or `PathNotFoundException` within your step's script.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always install compatible versions of `azureml-core` and `azureml-pipeline-steps`. Refer to the official Azure ML SDK release notes for breaking changes and version compatibility matrix. Pin your dependencies in `requirements.txt`.","message":"The `azureml-sdk` components, including `azureml-pipeline-steps`, often introduce breaking changes, especially between major or significant minor versions, particularly concerning environment definitions, data APIs, and compute targets. Incompatible `azureml-core` and `azureml-pipeline-steps` versions can cause issues.","severity":"breaking","affected_versions":"Between SDK major/minor versions (e.g., 1.0 to 1.15, 1.15 to 1.30 etc.)"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Install the `azureml-core` package: `pip install azureml-core`.","cause":"The `azureml-core` package, which provides fundamental Azure ML SDK functionalities like `Workspace` and `Environment`, is not installed.","error":"ModuleNotFoundError: No module named 'azureml.core'"},{"fix":"Verify your `config.json` is present and correct, or provide explicit `subscription_id`, `resource_group`, and `workspace_name` parameters along with an `auth` object when calling `Workspace.get()` or `Workspace.from_config()`.","cause":"The Azure ML Workspace could not be found or authenticated. This usually indicates incorrect subscription/resource group/workspace name, or a lack of proper authentication credentials.","error":"azureml.exceptions.UserErrorException: Workspace not found for subscription ID..."},{"fix":"Examine the detailed logs of the failed step in Azure ML Studio to identify the specific error message from your script. Ensure all required packages are specified in the step's `Environment` definition and that script paths/arguments are correct.","cause":"This generic error indicates a failure within your Python script that was executed by the pipeline step. Common causes include missing dependencies in the step's environment, errors in script logic, or incorrect input/output paths.","error":"ScriptExecutionException: User program failed with exit code 1"},{"fix":"Verify the `Environment` object's name and definition. If creating from a file, ensure the file path is correct and the Conda/Docker specification is valid. Consider using curated environments for simplicity if applicable.","cause":"The `Environment` object specified for the pipeline step either doesn't exist, has an invalid definition (e.g., incorrect Conda dependencies file), or lacks permissions to be created/accessed.","error":"UserErrorException: The specified environment is not found or cannot be created."}]}