PyAthenaJDBC: Amazon Athena JDBC driver wrapper for Python DB API 2.0
PyAthenaJDBC is a Python DB API 2.0 (PEP 249) compliant wrapper for Amazon Athena, utilizing the official JDBC driver via JPype. It provides a way to interact with Athena from Python using standard database connection patterns. The library is currently at version 3.0.1 and frequently updates to support the latest Athena JDBC driver and port features from its pure Python counterpart, PyAthena.
Warnings
- breaking Version 3.0.0 dropped support for Python 2.7 and Python 3.5. It also redesigned Formatter and Converter classes, which might affect custom type handling.
- breaking Version 2.1.0 changed the argument names for the `connect` method to align with the JDBC driver's Driver Configuration Options. For example, `access_key` became `User`, `secret_key` became `Password`, `region_name` became `AwsRegion`, `schema_name` became `Schema`, and `s3_staging_dir` became `S3OutputLocation`.
- gotcha The Amazon Athena JDBC driver download URL changed in v3.0.0 (for driver 2.0.15). If you are behind a strict firewall or proxy that whitelists specific URLs, this change might prevent the library from automatically downloading the driver JAR.
- gotcha PyAthenaJDBC relies on JPype1 for its Java bridge. Historically, there have been specific JPype1 version incompatibilities (e.g., v2.0.6 pinned JPype1 to <=0.7.1). While newer versions aim for broader compatibility, always check release notes when upgrading either library.
- gotcha While `S3OutputLocation` (formerly `s3_staging_dir`) was made optional in `connect` method since v2.0.8, Athena queries fundamentally require an S3 location to store query results. Omitting it from `connect` means it must be configured at the Athena Workgroup level or other default settings, otherwise, queries will fail.
Install
-
pip install pyathenajdbc
Imports
- connect
from pyathenajdbc.connection import Connection
from pyathenajdbc import connect
Quickstart
import os
from pyathenajdbc import connect
# Ensure AWS credentials (e.g., AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
# and region (AWS_REGION) are set via environment variables or boto3 config.
# S3OutputLocation is required for Athena query results.
# It's recommended to set it via an environment variable.
# Example: export AWS_ATHENA_S3_OUTPUT_LOCATION='s3://your-athena-query-results-bucket/'
s3_output_location = os.environ.get('AWS_ATHENA_S3_OUTPUT_LOCATION', '')
if not s3_output_location:
raise ValueError("AWS_ATHENA_S3_OUTPUT_LOCATION environment variable must be set.")
try:
conn = connect(
AwsRegion=os.environ.get('AWS_REGION', 'us-east-1'),
Schema='default', # Your Athena database name
S3OutputLocation=s3_output_location,
# User=os.environ.get('AWS_ACCESS_KEY_ID'), # Optional if using default credential chain
# Password=os.environ.get('AWS_SECRET_ACCESS_KEY') # Optional if using default credential chain
)
with conn.cursor() as cursor:
cursor.execute("SELECT 1 as one_value")
row = cursor.fetchone()
print(f"Result from SELECT 1: {row}")
cursor.execute("SHOW TABLES")
tables = cursor.fetchall()
print(f"Tables in 'default' schema: {tables}")
finally:
if 'conn' in locals() and conn:
conn.close()