AWS CDK AWS Glue Alpha Construct Library

2.248.0a0 · active · verified Sun Apr 12

The AWS CDK Construct Library for AWS Glue is an experimental module providing higher-level (L2) constructs for defining AWS Glue resources in your CDK applications. It simplifies the creation of Glue Jobs, Databases, and other components with opinionated defaults, aiming for best practices. Being an 'alpha' module, its APIs are subject to non-backward compatible changes, and it's currently at version 2.248.0a0. The AWS CDK typically has a frequent release cadence, with alpha modules updating regularly.

Warnings

Install

Imports

Quickstart

This quickstart defines a basic AWS CDK stack that creates an S3 bucket for Glue scripts, an IAM role for the Glue job, and a Python Shell Glue Job using the `aws-cdk-aws-glue-alpha` construct library. It uploads a local Python script as an S3 asset for the Glue Job. Remember to replace 'YOUR_AWS_ACCOUNT_ID' and 'YOUR_AWS_REGION' or set `CDK_DEFAULT_ACCOUNT` and `CDK_DEFAULT_REGION` environment variables.

import os
from pathlib import Path
import aws_cdk as cdk
import aws_cdk.aws_s3 as s3
import aws_cdk.aws_iam as iam
import aws_cdk.aws_glue_alpha as glue_alpha

# Create a dummy script file for the example Glue Job
script_content = """
import sys
from awsglue.utils import getResolvedOptions

print("Hello from Glue Job!")
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
print(f"Job Name: {args['JOB_NAME']}")
"""
script_dir = Path.cwd() / "glue_assets"
script_dir.mkdir(exist_ok=True)
script_path = script_dir / "my_glue_script.py"
script_path.write_text(script_content)

app = cdk.App()
stack = cdk.Stack(app, "MyGlueJobStack",
    env=cdk.Environment(
        account=os.environ.get('CDK_DEFAULT_ACCOUNT', 'YOUR_AWS_ACCOUNT_ID'),
        region=os.environ.get('CDK_DEFAULT_REGION', 'YOUR_AWS_REGION')
    )
)

# S3 bucket to store the Glue script
bucket = s3.Bucket(stack, "GlueScriptsBucket",
    removal_policy=cdk.RemovalPolicy.DESTROY,
    auto_delete_objects=True
)

# IAM role for the Glue Job
glue_role = iam.Role(stack, "GlueJobRole",
    assumed_by=iam.ServicePrincipal("glue.amazonaws.com"),
    managed_policies=[
        iam.ManagedPolicy.from_aws_managed_policy_name("service-role/AWSGlueServiceRole"),
        iam.ManagedPolicy.from_aws_managed_policy_name("AmazonS3FullAccess"), # For simplicity, grant S3 access
    ]
)

# Define a Python Shell Glue Job
glue_job = glue_alpha.Job(stack, "MyPythonShellJob",
    executable=glue_alpha.JobExecutable.python_shell(
        glue_version=glue_alpha.GlueVersion.V3_0, # Specify a Glue version
        python_version=glue_alpha.PythonVersion.THREE_NINE, # Specify Python 3.9
        script=glue_alpha.Code.from_asset(str(script_path)) # Upload script from local path
    ),
    role=glue_role,
    job_name="MyPythonShellCDKJob",
    description="A simple Python Shell Glue Job created with CDK alpha construct.",
    tags={
        "Project": "CDKGlueAlphaDemo"
    }
)

cdk.CfnOutput(stack, "GlueJobName", value=glue_job.job_name)
cdk.CfnOutput(stack, "GlueScriptBucketName", value=bucket.bucket_name)

app.synth()

# Clean up the dummy script file and directory
script_path.unlink()
script_dir.rmdir()

view raw JSON →