AWS Step Functions Data Science SDK
The AWS Step Functions Data Science SDK is an open-source Python library that allows data scientists to easily create, visualize, and execute machine learning (ML) workflows using Amazon SageMaker and AWS Step Functions. It enables the orchestration of AWS infrastructure at scale directly from Python code or Jupyter notebooks, abstracting away the need to provision and integrate AWS services separately. The library is currently at version 2.3.0 and maintains an active development and release cadence, with several updates per year.
Common errors
-
You don't have permissions to perform this action.
cause The IAM role associated with the Step Functions execution lacks the necessary permissions to call an AWS service API required by a state in the workflow.fixReview the `role` ARN provided to the `Workflow` constructor. Ensure the IAM role's policy explicitly grants `Allow` access for the specific actions (e.g., `sagemaker:CreateTrainingJob`, `lambda:InvokeFunction`, `s3:GetObject`, `states:StartExecution`) on the required resources. -
An error occurred (ValidationException) when calling the CreateStateMachine operation: State machine definition is invalid.
cause The JSON definition generated for the state machine contains syntax errors, invalid state names, or incorrect transitions.fixThoroughly check the `state_id` values for uniqueness and length (max 128 characters). Validate the workflow structure, especially `Next` states and `Catch` rules. Use `workflow.definition.to_json(pretty=True)` to inspect the generated ASL definition and compare it against the Amazon States Language specification. -
The input doesn't meet the required format or constraints.
cause Parameters passed to a step (e.g., `TrainingStep` `data` or `hyperparameters`) or to the workflow execution are not in the expected format or exceed size limits.fixConsult the `stepfunctions` SDK documentation for the specific step or method being used to understand the expected input format and constraints. For SageMaker steps, refer to SageMaker Python SDK documentation for `Estimator.fit()` or `TrainingInput` arguments. Check for `States.DataLimitExceeded` if the payload size is excessive. -
The request timed out.
cause An AWS service task or an API call within a step took longer than its configured timeout, or a network issue prevented a timely response.fixFor Task states, increase the `TimeoutSeconds` (if applicable) or `HeartbeatSeconds` to allow more time. For long-running SageMaker jobs, ensure the SageMaker estimator's `job_timeout_in_seconds` is sufficient. Review network connectivity and service quotas.
Warnings
- breaking Version 2.0.0 dropped support for Python 2. Projects must use Python 3 or newer.
- breaking With version 2.0.0, if your project uses the Amazon SageMaker Python SDK, it must be upgraded to version 2.x or later.
- breaking For `TrainingStep` and `TuningStep`, `sagemaker.session.s3_input` has been renamed to `sagemaker.inputs.TrainingInput` in SageMaker SDK v2.
- gotcha Prior to v2.3.0, placeholder hyperparameters passed to `TrainingStep` could be overwritten or not correctly applied if also specified in the estimator definition.
- gotcha IAM permissions are a frequent source of errors. Step Functions requires an execution role with permissions to invoke target services (e.g., SageMaker, Lambda, Glue) and manage workflow executions. Ensure granular permissions are granted.
- gotcha The `States.ALL` error catcher in Step Functions does not catch all errors; specifically, `States.DataLimitExceeded` is a terminal error that cannot be caught.
Install
-
pip install stepfunctions -
pip install stepfunctions[sagemaker]
Imports
- Workflow
from stepfunctions.workflow import Workflow
- steps
from stepfunctions import steps
- ExecutionInput
from stepfunctions.inputs import ExecutionInput
- TrainingStep
from stepfunctions.steps import TrainingStep
from stepfunctions.steps.sagemaker import TrainingStep
Quickstart
import os
from stepfunctions.workflow import Workflow
from stepfunctions.steps import Pass, Chain
# Dummy AWS credentials for local testing/placeholder - replace with actual credentials/roles in production
# Ensure your environment has AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN (optional) and AWS_REGION set
# or your ~/.aws/credentials and ~/.aws/config are configured.
# For actual deployment, you'd typically use an IAM role.
# Define a simple Pass state
pass_step = Pass(
state_id='MyPassState',
parameters={
'input_data.$': '$'
}
)
# Create a workflow
workflow = Workflow(
name='MySimpleWorkflow',
definition=pass_step,
role=os.environ.get('STEPFUNCTIONS_EXECUTION_ROLE_ARN', 'arn:aws:iam::123456789012:role/FakeStepFunctionsExecutionRole') # Replace with your IAM Role ARN
)
try:
# Create the workflow on AWS Step Functions
workflow.create()
print(f"Workflow '{workflow.name}' created successfully.")
# Execute the workflow with sample input
execution = workflow.execute(inputs={'message': 'Hello from Step Functions SDK!'})
print(f"Workflow execution started with ARN: {execution.execution_arn}")
# Wait for execution to complete and print output
execution.wait_for_completion()
print(f"Execution finished. Output: {execution.get_output()}")
# Clean up (optional, for real workflows, you might not delete immediately)
# workflow.delete()
# print(f"Workflow '{workflow.name}' deleted.")
except Exception as e:
print(f"An error occurred: {e}")
print("Please ensure your AWS credentials are configured and the IAM role ARN is valid and has necessary permissions.")
print("You can define the IAM role ARN as an environment variable STEPFUNCTIONS_EXECUTION_ROLE_ARN.")