PyAthena
PyAthena is a Python DB API 2.0 (PEP 249) client for Amazon Athena, enabling SQL queries on data stored in Amazon S3. It provides a familiar interface for database interactions, supports various cursor types (e.g., standard, Pandas, Arrow), SQLAlchemy integration, and asynchronous query execution. The library is actively maintained with frequent updates.
Warnings
- breaking Starting with PyAthena v3.30.0, the library no longer infers Python types for scalar values inside complex Athena types (e.g., '123' to 123 in structs/arrays). Values are kept as strings unless `result_set_type_hints` is provided.
- gotcha The `s3_staging_dir` and `region_name` parameters are mandatory when establishing a connection to Athena. Failure to provide them will result in a connection error.
- gotcha For very large query results, the default cursor might be slow as it fetches results in smaller chunks. This can lead to performance bottlenecks for extensive data analysis.
- gotcha Ensure your AWS environment is correctly configured for authentication (e.g., IAM role, `~/.aws/credentials`, or environment variables `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`). PyAthena relies on `boto3` for credential resolution.
Install
-
pip install pyathena -
pip install "pyathena[sqlalchemy,pandas,arrow,polars]"
Imports
- connect
from pyathena import connect
Quickstart
import os
from pyathena import connect
# Configure these environment variables or replace with actual values
# AWS_S3_STAGING_DIR: S3 path for Athena query results (e.g., "s3://my-athena-results-bucket/")
# AWS_REGION_NAME: AWS region (e.g., "us-east-1")
# AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN will be picked up by boto3 if not explicitly passed
s3_staging_dir = os.environ.get('AWS_S3_STAGING_DIR', 's3://your-athena-query-results-bucket/')
region_name = os.environ.get('AWS_REGION_NAME', 'us-east-1')
aws_access_key_id = os.environ.get('AWS_ACCESS_KEY_ID')
aws_secret_access_key = os.environ.get('AWS_SECRET_ACCESS_KEY')
aws_session_token = os.environ.get('AWS_SESSION_TOKEN')
# Ensure mandatory parameters are set
if not s3_staging_dir.startswith('s3://') or not region_name:
print("Error: AWS_S3_STAGING_DIR and AWS_REGION_NAME must be set correctly.")
else:
try:
# Connect to Athena
conn = connect(
s3_staging_dir=s3_staging_dir,
region_name=region_name,
aws_access_key_id=aws_access_key_id, # Optional: boto3 usually handles this
aws_secret_access_key=aws_secret_access_key, # Optional
aws_session_token=aws_session_token # Optional
)
cursor = conn.cursor()
# Execute a sample query
cursor.execute("SELECT 1 as one, 'hello' as greeting")
# Fetch results
print("Query Results:")
for row in cursor.fetchall():
print(row)
# Close cursor and connection
cursor.close()
conn.close()
except Exception as e:
print(f"An error occurred: {e}")
print("Please ensure AWS credentials are configured (e.g., via environment variables, ~/.aws/credentials, or IAM role) and AWS_S3_STAGING_DIR and AWS_REGION_NAME are set correctly.")