AWS CDK AWS Glue Alpha Construct Library
The AWS CDK Construct Library for AWS Glue is an experimental module providing higher-level (L2) constructs for defining AWS Glue resources in your CDK applications. It simplifies the creation of Glue Jobs, Databases, and other components with opinionated defaults, aiming for best practices. Being an 'alpha' module, its APIs are subject to non-backward compatible changes, and it's currently at version 2.248.0a0. The AWS CDK typically has a frequent release cadence, with alpha modules updating regularly.
Warnings
- breaking As an 'alpha' module, APIs in `aws-cdk-aws-glue-alpha` are experimental and subject to non-backward compatible changes or removal in any future version without adhering to semantic versioning. Developers should expect to update their source code when upgrading versions.
- breaking Recent iterations of the Glue L2 construct have introduced breaking changes, particularly in how Glue Jobs are instantiated. Users must refactor existing job definitions to explicitly choose the job type (e.g., `python_shell`, `spark_etl`) and language.
- gotcha The `Code.fromAsset()` method for specifying job scripts requires a path to a single *file*, not a directory. Providing a directory will result in a validation error.
- gotcha The Glue L2 constructs are 'opinionated,' meaning they enforce best practices and may not allow creating resources with non-current Glue versions or deprecated language dependencies (e.g., older Python versions). For maximum flexibility, L1 (CloudFormation) constructs might be necessary.
- deprecated The `aws-cdk-aws-glue-alpha` module is expected to migrate from its alpha state into the core `aws-cdk-lib` after a stabilization phase (typically around 3 months from its announcement). This will eventually change the import path and package name.
Install
-
pip install aws-cdk-aws-glue-alpha
Imports
- glue_alpha
import aws_cdk.aws_glue_alpha as glue_alpha
Quickstart
import os
from pathlib import Path
import aws_cdk as cdk
import aws_cdk.aws_s3 as s3
import aws_cdk.aws_iam as iam
import aws_cdk.aws_glue_alpha as glue_alpha
# Create a dummy script file for the example Glue Job
script_content = """
import sys
from awsglue.utils import getResolvedOptions
print("Hello from Glue Job!")
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
print(f"Job Name: {args['JOB_NAME']}")
"""
script_dir = Path.cwd() / "glue_assets"
script_dir.mkdir(exist_ok=True)
script_path = script_dir / "my_glue_script.py"
script_path.write_text(script_content)
app = cdk.App()
stack = cdk.Stack(app, "MyGlueJobStack",
env=cdk.Environment(
account=os.environ.get('CDK_DEFAULT_ACCOUNT', 'YOUR_AWS_ACCOUNT_ID'),
region=os.environ.get('CDK_DEFAULT_REGION', 'YOUR_AWS_REGION')
)
)
# S3 bucket to store the Glue script
bucket = s3.Bucket(stack, "GlueScriptsBucket",
removal_policy=cdk.RemovalPolicy.DESTROY,
auto_delete_objects=True
)
# IAM role for the Glue Job
glue_role = iam.Role(stack, "GlueJobRole",
assumed_by=iam.ServicePrincipal("glue.amazonaws.com"),
managed_policies=[
iam.ManagedPolicy.from_aws_managed_policy_name("service-role/AWSGlueServiceRole"),
iam.ManagedPolicy.from_aws_managed_policy_name("AmazonS3FullAccess"), # For simplicity, grant S3 access
]
)
# Define a Python Shell Glue Job
glue_job = glue_alpha.Job(stack, "MyPythonShellJob",
executable=glue_alpha.JobExecutable.python_shell(
glue_version=glue_alpha.GlueVersion.V3_0, # Specify a Glue version
python_version=glue_alpha.PythonVersion.THREE_NINE, # Specify Python 3.9
script=glue_alpha.Code.from_asset(str(script_path)) # Upload script from local path
),
role=glue_role,
job_name="MyPythonShellCDKJob",
description="A simple Python Shell Glue Job created with CDK alpha construct.",
tags={
"Project": "CDKGlueAlphaDemo"
}
)
cdk.CfnOutput(stack, "GlueJobName", value=glue_job.job_name)
cdk.CfnOutput(stack, "GlueScriptBucketName", value=bucket.bucket_name)
app.synth()
# Clean up the dummy script file and directory
script_path.unlink()
script_dir.rmdir()