Dagster AWS
Dagster-aws provides a collection of integrations for common AWS services, enabling Dagster to orchestrate workloads involving S3, ECS, Lambda, EMR, and more. It offers resources, run launchers, and IO managers to seamlessly connect Dagster assets and operations with your AWS infrastructure. The current version is 0.28.22, and it typically releases monthly, in conjunction with major Dagster core updates.
Common errors
-
ModuleNotFoundError: No module named 'dagster_aws'
cause The 'dagster-aws' Python package is not installed in the environment where Dagster is trying to load it.fixRun `pip install dagster-aws` in your Python environment. If using extras, specify them, e.g., `pip install dagster-aws[s3,ecs]`. -
botocore.exceptions.NoCredentialsError: Unable to locate credentials
-
Access Denied (403 Forbidden) for S3 operations
cause Dagster-aws (which uses boto3 internally) cannot find valid AWS credentials, or the configured AWS credentials/IAM role lack the necessary permissions for the S3 bucket or objects being accessed.fixEnsure AWS credentials are set via environment variables (e.g., AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION), a shared credentials file (~/.aws/credentials), or that the EC2 instance/ECS task has an appropriate IAM role attached with sufficient S3 read/write permissions (s3:GetObject, s3:PutObject, s3:ListBucket). Verify the S3 bucket policy does not explicitly deny access. Ensure `region_name` is correctly configured in your `S3Resource`. -
ECS tasks failing to start with CLI parsing errors or "Usage: dagster [OPTIONS] COMMAND [ARGS]..." in logs from EcsRunLauncher
cause The command generated by `EcsRunLauncher` to execute a Dagster run (e.g., `dagster api execute_run <large JSON string>`) is not being correctly interpreted by the shell within the ECS container, often due to quoting or escape issues with complex JSON arguments.fixIn your ECS task definition or EcsRunLauncher configuration, ensure the command wraps the `dagster api execute_run` call in a shell invocation like `/bin/bash -c "dagster api execute_run '<large JSON string>'"` to correctly parse the arguments. -
TypeError: the JSON object must be str, bytes or bytearray when retrieving secrets with EcsRunLauncher
cause When `EcsRunLauncher` injects secrets from AWS Secrets Manager, they are available as environment variables within the run container, but not necessarily in the `dagit` (webserver) or `dagster-daemon` containers, or during early definition loading. Attempting to `json.loads(os.getenv('SECRET_VAR'))` too early (e.g., at module import time) can fail if the environment variable is not yet set or not a valid JSON string in that context.fixAccess secrets dynamically within op or asset compute functions where the environment variables are guaranteed to be populated by the `EcsRunLauncher`. If secrets are required at definition time, consider an alternative mechanism for injecting them into the definition loading process, or ensure the environment variable is consistently set across all Dagster components involved.
Warnings
- gotcha Dagster library versions (e.g., `dagster-aws` 0.x.y) are tightly coupled to specific `dagster` core versions (e.g., 1.x.y). Always ensure you install compatible versions to avoid runtime errors; mismatching minor versions is a common source of issues. For `dagster-aws` 0.28.x, ensure `dagster` core is 1.12.x.
- gotcha All interactions with AWS services (S3, ECS, Lambda, EMR, etc.) require correct AWS credentials and IAM permissions. Ensure the underlying compute environment (e.g., EC2 instance, ECS task, EKS pod) has an appropriate IAM role attached, or explicitly configure credentials via environment variables (`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`) or `~/.aws/credentials` for local development. Incorrect permissions lead to `AccessDenied` or `NoCredentialsError` exceptions.
- gotcha AWS resources like `S3Resource` or `EMRResource` may infer the AWS region from environment variables (`AWS_REGION`) or your `~/.aws/config` file. If running across multiple regions or in non-standard environments (e.g., localstack), always explicitly configure the `region_name` parameter to avoid unexpected cross-region errors or latency.
Install
-
pip install dagster-aws
Imports
- S3Resource
from dagster_aws.s3 import S3Resource
- s3_io_manager
from dagster_aws.s3.io_manager import s3_io_manager
- EcsRunLauncher
from dagster_aws.ecs import EcsRunLauncher
- LambdaRunLauncher
from dagster_aws.lambda_libs import LambdaRunLauncher
- emr_resource
from dagster_aws.emr import emr_resource
Quickstart
import os
from dagster import Definitions, asset, Config
from dagster_aws.s3 import S3Resource
class MyS3Config(Config):
bucket: str
key: str
@asset
def my_s3_asset(context, s3: S3Resource, config: MyS3Config):
"""
Writes a simple string to an S3 object.
"""
s3.get_client().put_object(
Bucket=config.bucket,
Key=config.key,
Body="Hello from Dagster S3!"
)
context.log.info(f"Wrote to s3://{config.bucket}/{config.key}")
defs = Definitions(
assets=[my_s3_asset],
resources={
"s3": S3Resource(
region_name=os.environ.get("AWS_REGION", "us-east-1"),
# For local testing, ensure these are set as env vars or use other AWS auth methods
aws_access_key_id=os.environ.get("AWS_ACCESS_KEY_ID", ""),
aws_secret_access_key=os.environ.get("AWS_SECRET_ACCESS_KEY", "")
)
}
)
# To run:
# 1. Ensure AWS credentials and AWS_REGION are set in your environment variables.
# 2. dagster dev -f your_file.py
# 3. In the UI, launch a run for 'my_s3_asset' with a config like:
# {"ops": {"my_s3_asset": {"inputs": {"config": {"bucket": "your-bucket-name", "key": "my-dagster-object.txt"}}}}}}