{"id":6317,"library":"azureml-pipeline-core","title":"Azure Machine Learning Pipeline Core","description":"This package contains core functionality for Azure Machine Learning pipelines, enabling the definition and execution of configurable machine learning workflows. As of version 1.62.0, it is a key component of the Azure ML SDK v1, providing building blocks for complex ML workflows. It follows a regular release cadence alongside other v1 SDK components.","status":"active","version":"1.62.0","language":"en","source_language":"en","source_url":"https://github.com/Azure/azureml-sdk-for-python","tags":["azure","machine-learning","pipeline","mlops","azureml-sdk-v1"],"install":[{"cmd":"pip install azureml-pipeline-core","lang":"bash","label":"Install core pipeline package"}],"dependencies":[{"reason":"Required for logging and metrics collection within Azure ML pipelines.","package":"azureml-telemetry","optional":false},{"reason":"Provides core REST client functionality for interacting with Azure services.","package":"msrest","optional":false},{"reason":"Used for Azure Active Directory authentication, essential for secure access.","package":"azure-identity","optional":false}],"imports":[{"symbol":"Pipeline","correct":"from azureml.pipeline.core import Pipeline"},{"symbol":"PipelineData","correct":"from azureml.pipeline.core import PipelineData"},{"symbol":"PipelineParameter","correct":"from azureml.pipeline.core import PipelineParameter"}],"quickstart":{"code":"import os\nfrom azureml.core import Workspace, Experiment, Environment\nfrom azureml.core.runconfig import RunConfiguration\nfrom azureml.pipeline.core import Pipeline, PipelineParameter, PipelineData\nfrom azureml.pipeline.steps import PythonScriptStep\n\n# NOTE: For an actual Azure ML run, ensure you have 'azureml-core' installed\n# and configured your workspace (e.g., via 'az login' and 'ws.write_config()').\n# This example mocks the Workspace for local execution without live Azure setup.\n\n# --- Mock Workspace for local execution (replace with actual Workspace.from_config() for Azure) ---\ntry:\n    # Attempt to load actual workspace if configured\n    ws = Workspace.from_config()\n    print(f\"Loaded Workspace: {ws.name}\")\nexcept Exception:\n    print(\"Could not load workspace from config. Using dummy for example execution.\")\n    class MockDatastore:\n        def __init__(self):\n            self.name = \"workspaceblobstore\"\n        def path(self, path_on_datastore): # Mimics the path() method\n            return f\"azureml://datastores/workspaceblobstore/paths/{path_on_datastore}\"\n    class MockWorkspace:\n        def __init__(self):\n            self.name = \"mock_ws\"\n            self.resource_group = \"mock_rg\"\n            self.subscription_id = \"mock_sub_id\"\n        def get_default_datastore(self):\n            return MockDatastore()\n        def compute_targets(self):\n            # Placeholder for compute target; 'local' is used if no cluster\n            return {\"cpu-cluster\": None}\n    ws = MockWorkspace()\n# -------------------------------------------------------------------------------------------------\n\n# Define an environment for the pipeline step (for Azure run, use a real curated/custom environment)\nmyenv = Environment(\"my-python-env\")\nmyenv.python.user_managed_dependencies = False\nmyenv.docker.enabled = True\nmyenv.docker.base_image = \"mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20210707.v1\"\n\n# Create a run configuration for the step\nrun_config = RunConfiguration()\nrun_config.environment = myenv\n\n# Define a pipeline parameter\npipeline_param = PipelineParameter(name=\"input_multiplier\", default_value=5)\n\n# Define an output for the step, using PipelineData for intermediate data\noutput_data = PipelineData(name=\"multiplied_output\", datastore=ws.get_default_datastore())\n\n# Create a dummy Python script for the pipeline step\nscript_content = \"\"\"\nimport argparse\nimport os\nfrom azureml.core import Run\n\nparser = argparse.ArgumentParser()\nparser.add_argument(\"--input_multiplier\", type=int)\nparser.add_argument(\"--output_path\", type=str)\nargs = parser.parse_args()\n\nprint(f\"Received input_multiplier: {args.input_multiplier}\")\n\n# Get the run context\nrun = Run.get_context() # This works even with a mocked workspace, but won't interact with Azure.\n\n# Create a dummy output directory and file\nos.makedirs(args.output_path, exist_ok=True)\nresult = args.input_multiplier * 10\noutput_file_path = os.path.join(args.output_path, \"result.txt\")\nwith open(output_file_path, \"w\") as f:\n    f.write(f\"Calculation result: {result}\")\n\nprint(f\"Outputting data to: {output_file_path}\")\nrun.upload_file(name=\"outputs/result.txt\", path_or_stream=output_file_path)\n\"\"\"\nscript_file = \"my_pipeline_script.py\"\nwith open(script_file, \"w\") as f:\n    f.write(script_content)\n\n# Create a PythonScriptStep\nstep1 = PythonScriptStep(\n    name=\"multiply_step\",\n    script_name=script_file,\n    arguments=[\n        \"--input_multiplier\", pipeline_param,\n        \"--output_path\", output_data\n    ],\n    outputs=[output_data],\n    compute_target=ws.compute_targets().get(\"cpu-cluster\", \"local\"), # Use 'local' for local execution\n    runconfig=run_config,\n    source_directory=\".\"\n)\n\n# Create the pipeline\npipeline = Pipeline(workspace=ws, steps=[step1])\n\nprint(f\"Pipeline '{pipeline.name}' created successfully.\")\nprint(\"To run this pipeline on Azure, ensure your workspace is configured and uncomment the submission code below.\")\n\n# # Example of how to submit the pipeline to Azure (requires actual Workspace & Experiment):\n# # experiment = Experiment(ws, \"my_pipeline_experiment\")\n# # pipeline_run = experiment.submit(pipeline, pipeline_parameters={\"input_multiplier\": 7})\n# # pipeline_run.wait_for_completion(show_output=True)\n\n# Clean up the dummy script file\nos.remove(script_file)\n","lang":"python","description":"This quickstart demonstrates how to define a basic Azure ML Pipeline using `azureml-pipeline-core`. It sets up a mocked workspace for local execution, defines a pipeline parameter, an output, and a `PythonScriptStep`. The pipeline includes a simple script that performs a calculation and saves an output file. Note that `azureml-core` is typically required for full functionality and interaction with an actual Azure ML Workspace."},"warnings":[{"fix":"Decide whether to use SDK v1 or v2. If migrating to v2, be prepared for a full rewrite of pipeline definitions. For new projects, consider starting with SDK v2 (`pip install azure-ml`) unless specific v1 features are required.","message":"This package (`azureml-pipeline-core`) is part of the Azure ML SDK v1. Azure ML SDK v2, introduced with the `azure-ml` package, uses a completely different API and conceptual model (e.g., YAML-based definitions, `MLClient`). Code written for v1 is not compatible with v2.","severity":"breaking","affected_versions":"All versions (v1 vs v2 distinction)"},{"fix":"Always install `azureml-core` alongside `azureml-pipeline-core` (e.g., `pip install azureml-core azureml-pipeline-core`) and ensure your workspace is properly configured for authentication.","message":"While `azureml-pipeline-core` is a standalone package, it is practically unusable without `azureml-core` for interacting with an Azure Machine Learning Workspace, Experiments, and Compute Targets. Many objects (like `Workspace`, `Experiment`, `Environment`, `RunConfiguration`) come from `azureml-core`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Use `PipelineData` for inputs/outputs that are generated by one step and consumed by another within the same pipeline run. Use `DataReference` when you need to refer to a pre-existing dataset that resides in a datastore or has been registered in the workspace.","message":"There is a common confusion between `PipelineData` and `DataReference`. `PipelineData` (from `azureml.pipeline.core`) is used for passing intermediate data *between* steps within a pipeline. `DataReference` (from `azureml.data.data_reference`) is used for referencing data already registered as a dataset in your Azure ML Workspace.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Thoroughly test environments locally first. Use curated environments where possible, or define custom environments explicitly with all necessary dependencies listed. Ensure `requirements.txt` or `conda_dependencies.yml` are complete and correctly specified in your `Environment`.","message":"Pipeline steps are highly sensitive to their execution environments. Incorrectly defined `Environment` objects, missing dependencies, or mismatched Python versions within the environment can lead to pipeline step failures that are hard to debug.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z"}