dbt-athena
dbt-athena is an adapter plugin for dbt (data build tool) that enables data analysts and engineers to transform data in Amazon Athena, a serverless query service for S3 data, using SQL. It is currently at version 1.10.0 and is actively maintained by dbt Labs. The library generally follows semantic versioning with frequent patch releases, minor versions adding backward-compatible features, and major versions potentially introducing breaking changes.
Warnings
- breaking As of dbt-core version 1.8, dbt-athena (and other adapters) no longer automatically install `dbt-core`. You must install `dbt-core` separately and ensure compatibility with your adapter version.
- gotcha The `dbt-athena-community` package is now a wrapper around `dbt-athena`. While still functional for backward compatibility, new projects and migrations should directly use `dbt-athena`.
- gotcha If a dbt model has the same name as an existing table in the AWS Glue catalog, or if a model is configured to use the same S3 location as an existing table, the adapter *deletes* files in that table's S3 location before recreating the table. This is to avoid conflicts.
- gotcha The `num_retries` parameter in `profiles.yml` (e.g., for query retries) is often misunderstood. Setting `num_retries: N` results in `N-1` actual retries after the initial attempt. For one retry, you should set `num_retries: 2`.
- gotcha Athena SQL (used by dbt-athena for most operations) imposes a limit of 100 partitions per write operation. This can cause issues with full table rebuilds or large incremental merges on highly partitioned tables.
- gotcha Table, schema, and database names should be lowercase when using dbt-athena to avoid potential conflicts and issues with Athena's case-insensitivity behavior.
Install
-
pip install dbt-core dbt-athena
Imports
- dbt-athena
dbt-athena is used via the dbt CLI and profiles.yml configuration; no direct Python imports for end-user functionality are typically needed.
Quickstart
# 1. Install dbt-athena and dbt-core pip install dbt-core dbt-athena # 2. Initialize a new dbt project (follow prompts, select 'athena' as database type) dbt init my_athena_project # 3. Configure your profiles.yml (e.g., ~/.dbt/profiles.yml or project_root/profiles.yml) # Example profiles.yml content: # my_athena_project: # target: dev # outputs: # dev: # type: athena # s3_staging_dir: s3://your-athena-query-results-bucket/dbt-staging/ # region_name: us-east-1 # database: your_athena_database # schema: dbt_schema # threads: 4 # aws_profile_name: default # Or use aws_access_key_id and aws_secret_access_key # 4. Create an S3 bucket and Athena database as prerequisites # (AWS CLI/Console steps, not Python code): # aws s3 mb s3://your-athena-query-results-bucket # aws athena create-data-catalog --name your_athena_database --type LAMBDA --parameters "catalog-id"="your_glue_catalog_id" # Or simply use an existing database # 5. Create a sample dbt model (e.g., my_athena_project/models/my_first_model.sql) # -- my_athena_project/models/my_first_model.sql # select 1 as id, 'hello' as message # 6. Test the connection and run your dbt project dbt debug --target dev dbt run --target dev