AWS CDK S3 Tables Alpha
The `aws-cdk-aws-s3tables-alpha` module provides AWS CDK constructs for defining S3-backed tables, integrating with AWS Glue Data Catalog and allowing querying via services like Amazon Athena. As an 'alpha' module within the AWS CDK v2 ecosystem (current version 2.250.0a0), it offers early access to features, with a rapid release cadence that may include frequent breaking changes.
Common errors
-
ModuleNotFoundError: No module named 'aws_cdk_aws_s3tables_alpha'
cause The `aws-cdk-aws-s3tables-alpha` Python package has not been installed.fixInstall the package: `pip install aws-cdk-aws-s3tables-alpha`. -
jsii.errors.JavaScriptError: Cannot read properties of undefined (reading 'PARQUET')
cause This error often occurs when `DataFormat` or `Column` are not imported or referenced correctly within the `TableOptions` or `S3Table` definition.fixEnsure `from aws_cdk_aws_s3tables_alpha import DataFormat, Column` is present and that `DataFormat.PARQUET` (or other formats) and `Column(...)` objects are correctly passed to `TableOptions`. -
jsii.errors.JavaScriptError: The 'databaseName' property is required for S3Table.
cause Essential properties like `database_name` or `table_name` were omitted when instantiating `S3Table`.fixProvide all required properties in the `S3Table` constructor. For example: `S3Table(self, 'Id', bucket=my_bucket, table_name='my-table', database_name='my-database', table_options=...)`.
Warnings
- breaking This is an ALPHA module. Stability is not guaranteed. Breaking changes can and do occur frequently, even between minor versions. Always consult the latest module documentation before updating.
- gotcha CDK v1 vs. v2 import paths: This module is exclusively for AWS CDK v2. Trying to import `aws_cdk` components in the v1 style (e.g., `import aws_cdk.aws_s3`) will lead to `ModuleNotFoundError` or other incompatibilities if using CDK v2.
- gotcha The `S3Table` construct creates AWS Glue Data Catalog tables. Ensure your CDK deployment role has sufficient IAM permissions to create, update, and delete Glue databases and tables, as well as S3 bucket permissions for the data.
Install
-
pip install aws-cdk-aws-s3tables-alpha aws-cdk-lib constructs
Imports
- S3Table
from aws_cdk_aws_s3tables_alpha import S3Table
- TableOptions
from aws_cdk_aws_s3tables_alpha import TableOptions
- DataFormat
from aws_cdk_aws_s3tables_alpha import DataFormat
- Column
from aws_cdk_aws_s3tables_alpha import Column
- aws_s3 as s3
import aws_cdk.aws_s3 as s3
from aws_cdk import aws_s3 as s3
Quickstart
import os
from aws_cdk import App, Stack, Environment, aws_s3 as s3
from aws_cdk.aws_s3 import Bucket, RemovalPolicy
from aws_cdk_aws_s3tables_alpha import S3Table, TableOptions, DataFormat, Column
class MyS3TableStack(Stack):
def __init__(self, scope: App, id: str, **kwargs) -> None:
super().__init__(scope, id, **kwargs)
# Define an S3 Bucket to store the table data
# In production, use RemovalPolicy.RETAIN and consider bucket versioning
data_bucket = Bucket(self, "MyDataLakeBucket",
versioned=True,
removal_policy=RemovalPolicy.DESTROY # CAUTION: Deletes bucket and contents on stack delete
)
# Define the S3 Table construct, creating a Glue Table
s3_glue_table = S3Table(self, "MyProductSalesS3Table",
bucket=data_bucket,
table_name="product_sales",
database_name="sales_data_db", # The AWS Glue database name
table_options=TableOptions(
data_format=DataFormat.PARQUET, # Common for data lakes
columns=[
Column(name="product_id", type="string"),
Column(name="sale_date", type="date"),
Column(name="amount", type="double"),
Column(name="region", type="string")
],
),
# Example: Partition by year and month for efficient querying
# partition_keys=[
# Column(name="year", type="string"),
# Column(name="month", type="string")
# ]
)
# You can access properties like table_name or database_name
# CfnOutput(self, "S3BucketName", value=data_bucket.bucket_name)
# CfnOutput(self, "GlueTableName", value=s3_glue_table.table_name)
# Standard CDK application entry point
app = App()
MyS3TableStack(app, "MyS3TableStack",
env=Environment(
account=os.environ.get("CDK_DEFAULT_ACCOUNT"),
region=os.environ.get("CDK_DEFAULT_REGION")
)
)
app.synth()