PyAthena

raw JSON →
3.30.1 verified Tue May 12 auth: no python install: verified

PyAthena is a Python DB API 2.0 (PEP 249) client for Amazon Athena, enabling SQL queries on data stored in Amazon S3. It provides a familiar interface for database interactions, supports various cursor types (e.g., standard, Pandas, Arrow), SQLAlchemy integration, and asynchronous query execution. The library is actively maintained with frequent updates.

pip install pyathena
error ModuleNotFoundError: No module named 'pyathena'
cause The `pyathena` library is not installed in the Python environment, or the Python interpreter cannot locate it in its search path.
fix
Install the library using pip: pip install PyAthena or ensure the correct virtual environment is activated.
error pyathena.error.DatabaseError: An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: This functionality is not enabled in the selected engine version.
cause This error occurs when attempting to use a feature, such as result reuse, that requires Athena engine version 3, but the configured workgroup is using an older engine version.
fix
Configure your Athena workgroup to use Athena engine version 3, or disable the feature causing the error (e.g., result_reuse_enable=False) if backward compatibility is required.
error botocore.exceptions.ClientError: An error occurred (AccessDeniedException) when calling the StartQueryExecution operation: User: arn:aws:iam::... is not authorized to perform: athena:StartQueryExecution on resource: ...
cause The AWS IAM user or role configured for PyAthena lacks the necessary permissions to execute Athena queries or access the specified S3 staging directory.
fix
Grant the required AWS IAM permissions (athena:StartQueryExecution, athena:GetQueryExecution, athena:GetQueryResults, s3:GetObject, s3:ListBucket, s3:PutObject, s3:DeleteObject for the staging S3 bucket) to the IAM entity PyAthena is using.
error botocore.exceptions.ParamValidationError: Parameter validation failed:
cause An invalid or incorrectly formatted parameter was passed to an underlying `boto3` call made by PyAthena, often due to a mismatch with expected types or values by the AWS API.
fix
Review the PyAthena connection parameters and query arguments, ensuring they conform to the expected types and formats as per PyAthena and boto3 documentation for the Athena service.
error AttributeError: 'NoneType' object has no attribute 'get'
cause This generic Python error occurs in PyAthena when an operation attempts to access an attribute (like `get`) on an object that is `None`, usually because a preceding step (e.g., connection, query execution, or result fetching) failed to return a valid object and instead returned `None`.
fix
Implement robust error handling and None checks around PyAthena API calls, especially for connect(), cursor.execute(), and result fetching methods, to identify and handle cases where these operations might fail and return None.
breaking Starting with PyAthena v3.30.0, the library no longer infers Python types for scalar values inside complex Athena types (e.g., '123' to 123 in structs/arrays). Values are kept as strings unless `result_set_type_hints` is provided.
fix If your code relies on the previous heuristic type inference for complex types, explicitly provide `result_set_type_hints` in your `connect` or `cursor.execute()` calls to specify the expected Athena DDL type signatures for affected columns. Otherwise, adapt your code to handle string values for complex type elements.
gotcha The `s3_staging_dir` and `region_name` parameters are mandatory when establishing a connection to Athena. Failure to provide them will result in a connection error.
fix Always pass `s3_staging_dir` (e.g., `s3://your-bucket/path/to/results/`) and `region_name` (e.g., `us-east-1`) to the `pyathena.connect()` function.
gotcha For very large query results, the default cursor might be slow as it fetches results in smaller chunks. This can lead to performance bottlenecks for extensive data analysis.
fix Consider using `PandasCursor` with the `chunksize` option (e.g., `cursor_class=PandasCursor, cursor_kwargs={'chunksize': 100000}`) for better memory management, or configure Athena to write results to S3 directly and then download/process the CSV file for optimal performance with massive datasets.
gotcha Ensure your AWS environment is correctly configured for authentication (e.g., IAM role, `~/.aws/credentials`, or environment variables `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`). PyAthena relies on `boto3` for credential resolution.
fix Verify `boto3`'s credential chain can find valid AWS credentials. For explicit control, you can pass `aws_access_key_id`, `aws_secret_access_key`, and `aws_session_token` directly to `pyathena.connect()`.
pip install "pyathena[sqlalchemy,pandas,arrow,polars]"
python os / libc variant status wheel install import disk
3.10 alpine (musl) sqlalchemy,pandas,arrow,polars wheel - 0.01s 609.5M
3.10 alpine (musl) sqlalchemy,pandas,arrow,polars - - 0.01s 570.8M
3.10 alpine (musl) pyathena wheel - 0.01s 54.1M
3.10 alpine (musl) pyathena - - 0.01s 53.9M
3.10 slim (glibc) sqlalchemy,pandas,arrow,polars wheel 15.9s 0.01s 575M
3.10 slim (glibc) sqlalchemy,pandas,arrow,polars - - 0.01s 538M
3.10 slim (glibc) pyathena wheel 4.5s 0.01s 55M
3.10 slim (glibc) pyathena - - 0.01s 54M
3.11 alpine (musl) sqlalchemy,pandas,arrow,polars wheel - 0.02s 630.0M
3.11 alpine (musl) sqlalchemy,pandas,arrow,polars - - 0.03s 591.2M
3.11 alpine (musl) pyathena wheel - 0.02s 57.8M
3.11 alpine (musl) pyathena - - 0.03s 57.6M
3.11 slim (glibc) sqlalchemy,pandas,arrow,polars wheel 14.3s 0.02s 595M
3.11 slim (glibc) sqlalchemy,pandas,arrow,polars - - 0.02s 557M
3.11 slim (glibc) pyathena wheel 4.1s 0.02s 58M
3.11 slim (glibc) pyathena - - 0.02s 58M
3.12 alpine (musl) sqlalchemy,pandas,arrow,polars wheel - 0.01s 613.8M
3.12 alpine (musl) sqlalchemy,pandas,arrow,polars - - 0.02s 575.0M
3.12 alpine (musl) pyathena wheel - 0.02s 49.3M
3.12 alpine (musl) pyathena - - 0.02s 49.1M
3.12 slim (glibc) sqlalchemy,pandas,arrow,polars wheel 14.1s 0.01s 579M
3.12 slim (glibc) sqlalchemy,pandas,arrow,polars - - 0.01s 541M
3.12 slim (glibc) pyathena wheel 3.6s 0.01s 50M
3.12 slim (glibc) pyathena - - 0.01s 50M
3.13 alpine (musl) sqlalchemy,pandas,arrow,polars wheel - 0.01s 612.2M
3.13 alpine (musl) sqlalchemy,pandas,arrow,polars - - 0.01s 573.3M
3.13 alpine (musl) pyathena wheel - 0.01s 49.0M
3.13 alpine (musl) pyathena - - 0.01s 48.7M
3.13 slim (glibc) sqlalchemy,pandas,arrow,polars wheel 13.6s 0.01s 577M
3.13 slim (glibc) sqlalchemy,pandas,arrow,polars - - 0.01s 539M
3.13 slim (glibc) pyathena wheel 3.2s 0.01s 49M
3.13 slim (glibc) pyathena - - 0.01s 49M
3.9 alpine (musl) sqlalchemy,pandas,arrow,polars wheel - 0.01s 387.1M
3.9 alpine (musl) sqlalchemy,pandas,arrow,polars - - 0.01s 387.0M
3.9 alpine (musl) pyathena wheel - 0.01s 52.8M
3.9 alpine (musl) pyathena - - 0.01s 52.7M
3.9 slim (glibc) sqlalchemy,pandas,arrow,polars wheel 16.4s 0.01s 363M
3.9 slim (glibc) sqlalchemy,pandas,arrow,polars - - 0.01s 363M
3.9 slim (glibc) pyathena wheel 5.2s 0.01s 53M
3.9 slim (glibc) pyathena - - 0.01s 53M

This quickstart demonstrates how to establish a connection to Amazon Athena, execute a simple SQL query, and fetch results using `pyathena`. It expects AWS credentials to be configured via environment variables, IAM roles, or `~/.aws/credentials` (handled by `boto3`). The `s3_staging_dir` and `region_name` are mandatory connection parameters.

import os
from pyathena import connect

# Configure these environment variables or replace with actual values
# AWS_S3_STAGING_DIR: S3 path for Athena query results (e.g., "s3://my-athena-results-bucket/")
# AWS_REGION_NAME: AWS region (e.g., "us-east-1")
# AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN will be picked up by boto3 if not explicitly passed
s3_staging_dir = os.environ.get('AWS_S3_STAGING_DIR', 's3://your-athena-query-results-bucket/')
region_name = os.environ.get('AWS_REGION_NAME', 'us-east-1')
aws_access_key_id = os.environ.get('AWS_ACCESS_KEY_ID')
aws_secret_access_key = os.environ.get('AWS_SECRET_ACCESS_KEY')
aws_session_token = os.environ.get('AWS_SESSION_TOKEN')

# Ensure mandatory parameters are set
if not s3_staging_dir.startswith('s3://') or not region_name:
    print("Error: AWS_S3_STAGING_DIR and AWS_REGION_NAME must be set correctly.")
else:
    try:
        # Connect to Athena
        conn = connect(
            s3_staging_dir=s3_staging_dir,
            region_name=region_name,
            aws_access_key_id=aws_access_key_id, # Optional: boto3 usually handles this
            aws_secret_access_key=aws_secret_access_key, # Optional
            aws_session_token=aws_session_token # Optional
        )
        cursor = conn.cursor()

        # Execute a sample query
        cursor.execute("SELECT 1 as one, 'hello' as greeting")

        # Fetch results
        print("Query Results:")
        for row in cursor.fetchall():
            print(row)

        # Close cursor and connection
        cursor.close()
        conn.close()

    except Exception as e:
        print(f"An error occurred: {e}")
        print("Please ensure AWS credentials are configured (e.g., via environment variables, ~/.aws/credentials, or IAM role) and AWS_S3_STAGING_DIR and AWS_REGION_NAME are set correctly.")