{"id":7757,"library":"stepfunctions","title":"AWS Step Functions Data Science SDK","description":"The AWS Step Functions Data Science SDK is an open-source Python library that allows data scientists to easily create, visualize, and execute machine learning (ML) workflows using Amazon SageMaker and AWS Step Functions. It enables the orchestration of AWS infrastructure at scale directly from Python code or Jupyter notebooks, abstracting away the need to provision and integrate AWS services separately. The library is currently at version 2.3.0 and maintains an active development and release cadence, with several updates per year.","status":"active","version":"2.3.0","language":"en","source_language":"en","source_url":"https://github.com/aws/aws-step-functions-data-science-sdk-python","tags":["aws","step-functions","sagemaker","workflow","data-science","mlops","orchestration","serverless"],"install":[{"cmd":"pip install stepfunctions","lang":"bash","label":"Core library"},{"cmd":"pip install stepfunctions[sagemaker]","lang":"bash","label":"With SageMaker dependencies"}],"dependencies":[{"reason":"Required for SageMaker-specific steps (e.g., TrainingStep, ProcessingStep). Can be installed as an extra to avoid heavy dependencies if not used.","package":"sagemaker","optional":true}],"imports":[{"symbol":"Workflow","correct":"from stepfunctions.workflow import Workflow"},{"symbol":"steps","correct":"from stepfunctions import steps"},{"symbol":"ExecutionInput","correct":"from stepfunctions.inputs import ExecutionInput"},{"note":"SageMaker-specific steps are located in the `stepfunctions.steps.sagemaker` submodule.","wrong":"from stepfunctions.steps import TrainingStep","symbol":"TrainingStep","correct":"from stepfunctions.steps.sagemaker import TrainingStep"}],"quickstart":{"code":"import os\nfrom stepfunctions.workflow import Workflow\nfrom stepfunctions.steps import Pass, Chain\n\n# Dummy AWS credentials for local testing/placeholder - replace with actual credentials/roles in production\n# Ensure your environment has AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN (optional) and AWS_REGION set\n# or your ~/.aws/credentials and ~/.aws/config are configured.\n# For actual deployment, you'd typically use an IAM role.\n\n# Define a simple Pass state\npass_step = Pass(\n    state_id='MyPassState',\n    parameters={\n        'input_data.$': '$'\n    }\n)\n\n# Create a workflow\nworkflow = Workflow(\n    name='MySimpleWorkflow',\n    definition=pass_step,\n    role=os.environ.get('STEPFUNCTIONS_EXECUTION_ROLE_ARN', 'arn:aws:iam::123456789012:role/FakeStepFunctionsExecutionRole') # Replace with your IAM Role ARN\n)\n\ntry:\n    # Create the workflow on AWS Step Functions\n    workflow.create()\n    print(f\"Workflow '{workflow.name}' created successfully.\")\n\n    # Execute the workflow with sample input\n    execution = workflow.execute(inputs={'message': 'Hello from Step Functions SDK!'})\n    print(f\"Workflow execution started with ARN: {execution.execution_arn}\")\n\n    # Wait for execution to complete and print output\n    execution.wait_for_completion()\n    print(f\"Execution finished. Output: {execution.get_output()}\")\n\n    # Clean up (optional, for real workflows, you might not delete immediately)\n    # workflow.delete()\n    # print(f\"Workflow '{workflow.name}' deleted.\")\n\nexcept Exception as e:\n    print(f\"An error occurred: {e}\")\n    print(\"Please ensure your AWS credentials are configured and the IAM role ARN is valid and has necessary permissions.\")\n    print(\"You can define the IAM role ARN as an environment variable STEPFUNCTIONS_EXECUTION_ROLE_ARN.\")","lang":"python","description":"This quickstart demonstrates how to define a simple Pass state, chain it into a workflow, create the workflow on AWS Step Functions, and then execute it with sample input. It includes placeholders for AWS IAM roles and credentials, which must be configured in your environment or AWS CLI for actual deployment and execution."},"warnings":[{"fix":"Upgrade your Python environment to version 3.x.","message":"Version 2.0.0 dropped support for Python 2. Projects must use Python 3 or newer.","severity":"breaking","affected_versions":">=2.0.0"},{"fix":"Upgrade `sagemaker` to version 2.x (`pip install sagemaker --upgrade`). Consult the SageMaker Python SDK v2 migration guide for detailed breaking changes.","message":"With version 2.0.0, if your project uses the Amazon SageMaker Python SDK, it must be upgraded to version 2.x or later.","severity":"breaking","affected_versions":">=2.0.0"},{"fix":"Update import statements and usage of `sagemaker.session.s3_input` to `sagemaker.inputs.TrainingInput` when passing data to `TrainingStep` or `TuningStep`.","message":"For `TrainingStep` and `TuningStep`, `sagemaker.session.s3_input` has been renamed to `sagemaker.inputs.TrainingInput` in SageMaker SDK v2.","severity":"breaking","affected_versions":">=2.0.0"},{"fix":"Upgrade to version 2.3.0 or later to ensure proper handling of placeholder hyperparameters in `TrainingStep`. Review how hyperparameters are defined and passed to avoid conflicts.","message":"Prior to v2.3.0, placeholder hyperparameters passed to `TrainingStep` could be overwritten or not correctly applied if also specified in the estimator definition.","severity":"gotcha","affected_versions":"<2.3.0"},{"fix":"Carefully review the IAM role associated with your Step Functions workflow. Grant only the necessary permissions (Least Privilege Principle) for each service API call within your state machine steps. Common errors include missing `sagemaker:CreateTrainingJob`, `lambda:InvokeFunction`, etc. For `DynamoDb` errors, ensure the service prefix is in PascalCase (e.g., `DynamoDb.ResourceInUseException`) for catch definitions.","message":"IAM permissions are a frequent source of errors. Step Functions requires an execution role with permissions to invoke target services (e.g., SageMaker, Lambda, Glue) and manage workflow executions. Ensure granular permissions are granted.","severity":"gotcha","affected_versions":"All"},{"fix":"Be aware that `States.ALL` will handle most but not all exceptions. Plan for `States.DataLimitExceeded` to cause a workflow failure, and design workflows to prevent it where possible (e.g., by managing payload sizes).","message":"The `States.ALL` error catcher in Step Functions does not catch all errors; specifically, `States.DataLimitExceeded` is a terminal error that cannot be caught.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Review the `role` ARN provided to the `Workflow` constructor. Ensure the IAM role's policy explicitly grants `Allow` access for the specific actions (e.g., `sagemaker:CreateTrainingJob`, `lambda:InvokeFunction`, `s3:GetObject`, `states:StartExecution`) on the required resources.","cause":"The IAM role associated with the Step Functions execution lacks the necessary permissions to call an AWS service API required by a state in the workflow.","error":"You don't have permissions to perform this action."},{"fix":"Thoroughly check the `state_id` values for uniqueness and length (max 128 characters). Validate the workflow structure, especially `Next` states and `Catch` rules. Use `workflow.definition.to_json(pretty=True)` to inspect the generated ASL definition and compare it against the Amazon States Language specification.","cause":"The JSON definition generated for the state machine contains syntax errors, invalid state names, or incorrect transitions.","error":"An error occurred (ValidationException) when calling the CreateStateMachine operation: State machine definition is invalid."},{"fix":"Consult the `stepfunctions` SDK documentation for the specific step or method being used to understand the expected input format and constraints. For SageMaker steps, refer to SageMaker Python SDK documentation for `Estimator.fit()` or `TrainingInput` arguments. Check for `States.DataLimitExceeded` if the payload size is excessive.","cause":"Parameters passed to a step (e.g., `TrainingStep` `data` or `hyperparameters`) or to the workflow execution are not in the expected format or exceed size limits.","error":"The input doesn't meet the required format or constraints."},{"fix":"For Task states, increase the `TimeoutSeconds` (if applicable) or `HeartbeatSeconds` to allow more time. For long-running SageMaker jobs, ensure the SageMaker estimator's `job_timeout_in_seconds` is sufficient. Review network connectivity and service quotas.","cause":"An AWS service task or an API call within a step took longer than its configured timeout, or a network issue prevented a timely response.","error":"The request timed out."}]}