Azure ML AutoML Training (SDK v1)
The `azureml-train-automl` package is part of the Azure Machine Learning SDK v1, designed for automatically finding the best machine learning model and its parameters. It streamlines model selection, hyperparameter tuning, and feature engineering for various ML tasks like classification, regression, and forecasting. As of v1.62.0, Azure Machine Learning SDK v1 is deprecated with support ending on June 30, 2026. Users are strongly advised to migrate to Azure Machine Learning SDK v2 for continued support and new features.
Common errors
-
ImportError: cannot import name 'AutoMLConfig'
cause Often occurs after upgrading `azureml-train-automl` from very old versions, leading to partial or corrupted installations.fixUninstall the package completely and reinstall it: `pip uninstall azureml-train-automl` followed by `pip install azureml-train-automl`. If still using older SDK versions, ensure consistent environment. -
ModuleNotFoundError: No module named 'sklearn.decomposition._truncated_svd' (or similar 'No module named' errors)
cause Incompatibility between the `azureml-train-automl` SDK version and locally installed `scikit-learn` or other dependent libraries.fixUpgrade or downgrade `scikit-learn` to a compatible version, typically `scikit-learn==0.22.1` for SDK v1.13.0 and above: `pip install --upgrade scikit-learn==0.22.1`. Consult the environment details for your specific Azure ML run. -
AttributeError: 'SimpleImputer' object has no attribute 'add_indicator' (or similar 'AttributeError')
cause Similar to `ModuleNotFoundError`, this indicates an incompatibility, often with `scikit-learn` or `pandas` versions, where a method or attribute expected by AutoML's internal code is missing.fixVerify and align your `pandas` and `scikit-learn` versions with the requirements for your `azureml-train-automl` SDK version. For SDK v1.13.0+, use `pandas==0.25.1` and `scikit-learn==0.22.1`. -
ValidationException during local AutoML training
cause Local Python environment dependencies not matching the requirements for AutoML local training.fixEnsure your local environment has all necessary dependencies correctly installed and version-matched. For complex local setups, consider using an Azure ML Compute Instance, which provides pre-configured environments compliant with AutoML requirements, or defining a custom Conda environment for the run.
Warnings
- breaking Azure Machine Learning SDK v1, which `azureml-train-automl` is part of, is deprecated as of March 31, 2025. Support will end on June 30, 2026. Existing workflows will continue to operate but could be exposed to security risks or breaking changes.
- gotcha Upgrading `azureml-train-automl` from versions prior to `1.0.76` can lead to partial installations and import failures due to internal dependency conflicts.
- breaking Several AutoML algorithms for Regression (`FastLinearRegressor`, `OnlineGradientDescentRegressor`) and Classification (`AveragedPerceptronClassifier`) were deprecated and are no longer supported in versions `1.49.0` and above.
- gotcha Dependency mismatches, especially with `pandas` and `scikit-learn`, are a common source of errors (e.g., `Module not found`, `ImportError`, `AttributeError`) due to strict version pinning in older SDK v1 releases.
Install
-
pip install azureml-train-automl
Imports
- AutoMLConfig
from azureml.automl.core import AutoMLConfig
from azureml.train.automl import AutoMLConfig
- Experiment
from azureml.core.experiment import Experiment
- Workspace
from azureml.core import Workspace
- Dataset
from azureml.core.dataset import Dataset
Quickstart
import os
import pandas as pd
from azureml.core.workspace import Workspace
from azureml.core.experiment import Experiment
from azureml.core.dataset import Dataset
from azureml.train.automl import AutoMLConfig
# NOTE: Replace with your actual workspace details or ensure config.json is present
try:
ws = Workspace.from_config()
print(f"Workspace loaded: {ws.name}")
except Exception as e:
print(f"Could not load workspace from config. Attempting environment variables. Error: {e}")
subscription_id = os.environ.get("AZUREML_SUBSCRIPTION_ID", "YOUR_SUBSCRIPTION_ID")
resource_group = os.environ.get("AZUREML_RESOURCE_GROUP", "YOUR_RESOURCE_GROUP")
workspace_name = os.environ.get("AZUREML_WORKSPACE_NAME", "YOUR_WORKSPACE_NAME")
if "YOUR_" in subscription_id + resource_group + workspace_name:
raise ValueError("Please configure your Azure ML Workspace details via config.json or environment variables.")
ws = Workspace.get(name=workspace_name, subscription_id=subscription_id, resource_group=resource_group)
print(f"Workspace loaded from env: {ws.name}")
experiment_name = "automl-quickstart-exp"
experiment = Experiment(ws, experiment_name)
# Load sample data (replace with your own Dataset registration or data path)
data_url = "https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard.csv"
df = pd.read_csv(data_url)
# For simplicity, create a dummy Dataset. In a real scenario, you'd register your data.
# Or use a registered dataset: Dataset.get_by_name(ws, name='my_dataset')
from azureml.data.tabulardataset import TabularDataset
training_data = TabularDataset.from_pandas_dataframe(df, target=(ws.get_default_datastore(), 'automl_creditcard.csv'))
# Configure AutoML
automl_config = AutoMLConfig(
task='classification',
primary_metric='accuracy',
training_data=training_data,
label_column_name='Class',
compute_target='local',
experiment_timeout_minutes=15,
max_concurrent_iterations=1,
n_cross_validations=2,
iterations=5,
verbosity=logging.INFO
)
# Submit the AutoML run
# NOTE: 'local' compute target runs on the current environment, may require many local dependencies.
# For remote compute, configure an AmlCompute target and specify it in AutoMLConfig.
print("Submitting AutoML experiment...")
# run = experiment.submit(automl_config, show_output=True)
# print(f"AutoML experiment submitted: {run.id}")
# print("NOTE: Uncomment the submit line and ensure compute target is configured for actual execution.")
import logging # Ensure logging is imported for verbosity