AWS CDK S3 Tables Alpha

2.250.0a0 · active · verified Fri Apr 17

The `aws-cdk-aws-s3tables-alpha` module provides AWS CDK constructs for defining S3-backed tables, integrating with AWS Glue Data Catalog and allowing querying via services like Amazon Athena. As an 'alpha' module within the AWS CDK v2 ecosystem (current version 2.250.0a0), it offers early access to features, with a rapid release cadence that may include frequent breaking changes.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create an S3 bucket and then define an S3-backed Glue Data Catalog table using `S3Table`. The table is configured with a PARQUET data format and several columns. Replace placeholder values like `product_sales` and `sales_data_db` with your specific requirements. This code synthesizes a CDK CloudFormation template.

import os
from aws_cdk import App, Stack, Environment, aws_s3 as s3
from aws_cdk.aws_s3 import Bucket, RemovalPolicy
from aws_cdk_aws_s3tables_alpha import S3Table, TableOptions, DataFormat, Column

class MyS3TableStack(Stack):
    def __init__(self, scope: App, id: str, **kwargs) -> None:
        super().__init__(scope, id, **kwargs)

        # Define an S3 Bucket to store the table data
        # In production, use RemovalPolicy.RETAIN and consider bucket versioning
        data_bucket = Bucket(self, "MyDataLakeBucket",
            versioned=True,
            removal_policy=RemovalPolicy.DESTROY # CAUTION: Deletes bucket and contents on stack delete
        )

        # Define the S3 Table construct, creating a Glue Table
        s3_glue_table = S3Table(self, "MyProductSalesS3Table",
            bucket=data_bucket,
            table_name="product_sales",
            database_name="sales_data_db", # The AWS Glue database name
            table_options=TableOptions(
                data_format=DataFormat.PARQUET, # Common for data lakes
                columns=[
                    Column(name="product_id", type="string"),
                    Column(name="sale_date", type="date"),
                    Column(name="amount", type="double"),
                    Column(name="region", type="string")
                ],
            ),
            # Example: Partition by year and month for efficient querying
            # partition_keys=[
            #     Column(name="year", type="string"),
            #     Column(name="month", type="string")
            # ]
        )

        # You can access properties like table_name or database_name
        # CfnOutput(self, "S3BucketName", value=data_bucket.bucket_name)
        # CfnOutput(self, "GlueTableName", value=s3_glue_table.table_name)

# Standard CDK application entry point
app = App()
MyS3TableStack(app, "MyS3TableStack",
    env=Environment(
        account=os.environ.get("CDK_DEFAULT_ACCOUNT"),
        region=os.environ.get("CDK_DEFAULT_REGION")
    )
)
app.synth()

view raw JSON →