SageMaker Experiments SDK
sagemaker-experiments is an open-source Python library from AWS for experiment tracking within Amazon SageMaker jobs and notebooks. It allows users to create, manage, and query machine learning experiments, trials, and trial components to track model parameters, metrics, and artifacts. The library maintains an active release cadence, with frequent minor updates.
Common errors
-
ModuleNotFoundError: No module named 'sagemaker.experiments'
cause The `sagemaker-experiments` library is not installed, or the import path is incorrect (e.g., trying `import sagemaker_experiments`).fixInstall the library using `pip install sagemaker-experiments`. Ensure your import statements are correct, typically `from sagemaker.experiments import Experiment`. -
AttributeError: 'Session' object has no attribute 'sagemaker_client' (or similar 'No default SageMaker session found' messages)
cause The code is being run outside a SageMaker Studio notebook or training job context without an explicit `sagemaker.Session` configured, or `boto3` credentials/region are not set up for local execution.fixEnsure `boto3` is configured with valid AWS credentials and a region (e.g., via environment variables, `~/.aws/credentials`, or `~/.aws/config`). When running locally, explicitly create a session: `import boto3; sess = sagemaker.Session(boto3.Session(region_name='your-region'))`. -
An error occurred (ValidationException) when calling the CreateExperiment operation: The Experiment with name '...' already exists.
cause You are attempting to create an experiment with a name that is already in use in your AWS account and region.fixChoose a unique name for your experiment (e.g., by appending a timestamp or process ID). If you intend to work with an existing experiment, use `Experiment.load(experiment_name='your-experiment-name', sagemaker_session=sess)` instead of `Experiment.create()`. -
TypeError: 'TrialComponent' object is not iterable (or similar type errors when logging)
cause The `tracker.log_parameters()` and `tracker.log_metrics()` methods expect dictionary inputs where values are simple types (numbers, strings, booleans). Attempting to log complex objects directly will fail.fixEnsure that parameters and metrics are provided as flat dictionaries with scalar values. For complex objects, serialize them to strings (e.g., JSON) or store them as artifacts in S3 and log their S3 URIs.
Warnings
- breaking The `sklearn` dependency was renamed to `scikit-learn` in `v0.1.42`. Projects directly depending on `sklearn` might experience `ModuleNotFoundError` if `scikit-learn` is not installed.
- breaking Support for Python 3.6 was officially dropped in `v0.1.42`. Users running `sagemaker-experiments` on Python 3.6 will encounter compatibility issues.
- gotcha When `Tracker.create()` is used outside of a SageMaker Training Job, it automatically creates a new `TrialComponent`. To associate this component with a specific `Trial` created manually (e.g., using `Trial.create()`), you must explicitly add it.
- gotcha A bug prior to `v0.1.44` caused issues loading trial components for jobs with mixed-case names. This could lead to difficulties in retrieving or visualizing experiment data.
Install
-
pip install sagemaker-experiments
Imports
- Experiment
from sagemaker.experiments import Experiment
- Trial
from sagemaker.experiments import Trial
- Tracker
from sagemaker.experiments.tracker import Tracker
Quickstart
import os
import sagemaker
from sagemaker.experiments import Experiment, Trial
from sagemaker.experiments.tracker import Tracker
# Ensure a SageMaker session is available. In a SageMaker Studio or Job,
# a session is usually automatically configured.
# For local execution, ensure AWS credentials and region are set up (e.g., via environment vars).
try:
sess = sagemaker.Session()
except Exception:
# Fallback for local execution outside a SageMaker context if default fails
import boto3
print("Creating sagemaker.Session with boto3.Session for local execution.")
sess = sagemaker.Session(boto3.Session(region_name=os.environ.get("AWS_REGION", "us-east-1")))
experiment_name = f"my-quickstart-experiment-{os.getpid()}"
trial_name = f"my-quickstart-trial-{os.getpid()}"
# 1. Create an Experiment
# Using .create() ensures a new experiment; .load() would retrieve an existing one.
my_experiment = Experiment.create(
experiment_name=experiment_name,
description="A simple quickstart experiment for sagemaker-experiments",
sagemaker_session=sess
)
print(f"Created Experiment: {my_experiment.experiment_name}")
# 2. Create a Trial within the Experiment
my_trial = Trial.create(
trial_name=trial_name,
experiment_name=experiment_name,
sagemaker_session=sess
)
print(f"Created Trial: {my_trial.trial_name}")
# 3. Use a Tracker to log parameters and metrics (e.g., simulating a training run)
with Tracker.create(display_name="TrainingJobComponent", sagemaker_session=sess) as tracker:
tracker.log_parameters({"learning_rate": 0.01, "epochs": 10, "optimizer": "Adam"})
tracker.log_metrics({"accuracy": 0.85, "loss": 0.15, "f1_score": 0.82})
print(f"Logged data to TrialComponent: {tracker.trial_component.trial_component_name}")
# Associate the tracker's automatically created trial component with our trial
my_trial.add_trial_component(tracker.trial_component)
print("Experiment, Trial, and TrialComponent created and data logged.")
print("You can view these in SageMaker Studio under the Experiments tab.")
# Optional: Clean up created resources (uncomment to enable)
# print("Cleaning up resources...")
# my_trial.delete_all_trial_components()
# my_trial.delete()
# my_experiment.delete()
# print("Cleanup complete.")