{"id":465,"library":"awswrangler","title":"AWS SDK for pandas (awswrangler)","description":"AWS SDK for pandas, also known as awswrangler, extends the popular Pandas library to simplify data integration with various AWS services. It provides high-level abstractions for common data engineering tasks like reading and writing data to Amazon S3, querying data in Athena and Redshift, and interacting with AWS Glue, DynamoDB, Timestream, and more. The library is actively maintained, with frequent releases, often on a monthly basis, and the current version is 3.15.1.","status":"active","version":"3.15.1","language":"python","source_language":"en","source_url":"https://github.com/aws/aws-sdk-pandas","tags":["aws","data","pandas","etl","s3","athena","glue","redshift","datalake","timestream","dynamodb","opensearch"],"install":[{"cmd":"pip install awswrangler","lang":"bash","label":"Basic installation"},{"cmd":"pip install 'awswrangler[redshift,mysql,postgresql]'","lang":"bash","label":"Installation with optional database drivers"}],"dependencies":[{"reason":"Core dependency for DataFrame operations.","package":"pandas"},{"reason":"Used for optimized data handling, especially Parquet. Frequently updated.","package":"pyarrow","optional":true},{"reason":"Underlying AWS SDK for interacting with AWS services.","package":"boto3"}],"imports":[{"symbol":"awswrangler","correct":"import awswrangler as wr"},{"symbol":"pandas","correct":"import pandas as pd"}],"quickstart":{"code":"import awswrangler as wr\nimport pandas as pd\nfrom datetime import datetime\nimport os\n\n# Ensure you have AWS credentials configured (e.g., via AWS CLI or environment variables)\n# For quickstart, ensure the S3_BUCKET is set in your environment\ns3_bucket = os.environ.get('S3_BUCKET', 'your-aws-s3-bucket-name')\nif s3_bucket == 'your-aws-s3-bucket-name':\n    print(\"WARNING: Please set the S3_BUCKET environment variable or replace 'your-aws-s3-bucket-name' in the code.\")\n\ndatabase_name = os.environ.get('ATHENA_DATABASE', 'awswrangler_db')\nif database_name == 'awswrangler_db':\n    print(\"WARNING: Using default Athena database 'awswrangler_db'. Consider setting ATHENA_DATABASE env var.\")\n\n# Create a sample DataFrame\ndf = pd.DataFrame({\n    \"id\": [1, 2],\n    \"value\": [\"foo\", \"boo\"],\n    \"timestamp\": [datetime.now(), datetime.now()]\n})\n\n# 1. Store data on S3 as Parquet and register with Glue Catalog\ns3_path = f\"s3://{s3_bucket}/awswrangler-quickstart/my_dataset/\"\nprint(f\"Writing DataFrame to S3: {s3_path}\")\nwr.s3.to_parquet(\n    df=df,\n    path=s3_path,\n    dataset=True,\n    database=database_name,\n    table=\"my_table_parquet\",\n    mode=\"overwrite\",\n    partition_cols=['value']\n)\nprint(\"Data written and cataloged.\")\n\n# 2. Retrieve the data directly from Amazon S3\nprint(f\"Reading data from S3: {s3_path}\")\ndf_from_s3 = wr.s3.read_parquet(s3_path, dataset=True)\nprint(f\"Read {len(df_from_s3)} rows from S3:\\n{df_from_s3}\")\n\n# 3. Retrieve the data from Amazon Athena\nprint(f\"Reading data from Athena table '{database_name}.my_table_parquet'\")\ndf_from_athena = wr.athena.read_sql_query(f\"SELECT * FROM my_table_parquet\", database=database_name)\nprint(f\"Read {len(df_from_athena)} rows from Athena:\\n{df_from_athena}\")\n\n# Example for Redshift (requires awswrangler[redshift] and a Glue connection)\n# try:\n#     # Replace 'my-glue-connection' with your actual Glue connection name\n#     con = wr.redshift.connect(\"my-glue-connection\")\n#     df_from_redshift = wr.redshift.read_sql_query(\"SELECT 1 as example_col\", con=con)\n#     print(f\"Read from Redshift:\\n{df_from_redshift}\")\n#     con.close()\n# except Exception as e:\n#     print(f\"Could not connect to Redshift or run query (this is expected if not configured): {e}\")\n","lang":"python","description":"This quickstart demonstrates how to use `awswrangler` to write a Pandas DataFrame to Amazon S3 as a Parquet dataset, register it in the AWS Glue Data Catalog, and then read it back using both direct S3 access and Amazon Athena. It assumes AWS credentials are configured in the environment (e.g., via `~/.aws/credentials` or environment variables) and requires an S3 bucket and an Athena database for execution. Database connection examples (e.g., Redshift) are commented out as they require additional setup."},"warnings":[{"fix":"Upgrade your Python environment to 3.10 or newer (currently up to 3.14 supported).","message":"Python 3.9 was dropped in version 3.15.0, and Python 3.8 in 3.11.0. Older Python versions are no longer supported. Ensure your environment uses Python >= 3.10.","severity":"breaking","affected_versions":">=3.11.0, >=3.15.0"},{"fix":"Specify the required extra packages during installation: `pip install 'awswrangler[feature1,feature2]'`. Refer to the documentation for available extras.","message":"Starting from version 3.0, feature-specific dependencies (e.g., for Redshift, MySQL, OpenSearch) must be installed explicitly using extras syntax (e.g., `pip install 'awswrangler[redshift]'`). Simply installing `awswrangler` will only include core dependencies.","severity":"breaking","affected_versions":">=3.0.0"},{"fix":"If encountering build issues, you can pin PyArrow to an older version (e.g., `pip install awswrangler 'pyarrow<21'`), upgrade CMake, or use a newer platform environment like AL2023-V1 for SageMaker. For Glue PySpark jobs, specific PyArrow versions may be required (e.g., `pyarrow==14,pandas==1.5.3,awswrangler==3.15.1`).","message":"AWS SDK for pandas versions `>=3.14.0` default to PyArrow 21.0.0+, which requires CMake 3.25+ to build. This can cause issues in environments with older CMake versions (e.g., Amazon Linux 2 notebook instances).","severity":"gotcha","affected_versions":">=3.14.0"},{"fix":"Update your code to handle the new DataFrame output format when reading from DynamoDB, specifically for `wr.dynamodb.read_items`.","message":"The output format for `wr.dynamodb.read_items` changed in version 3.5.0. It now returns DynamoDB datatypes within the DataFrame, which can break existing parsing logic.","severity":"gotcha","affected_versions":">=3.5.0"},{"fix":"Increase the memory allocation for your AWS Lambda function to 512MB or higher, depending on your workload's data volume and complexity.","message":"AWS Lambda functions using the `awswrangler` layer with less than 512MB of memory might be insufficient for some data processing workloads, leading to memory-related errors.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Regularly update `awswrangler` to the latest version to incorporate the most recent security fixes and dependency updates.","message":"Security vulnerabilities in underlying dependencies (e.g., `aiohttp`, `setuptools`, `pg8000`) are frequently fixed in new `awswrangler` releases. Running older versions might expose you to known CVEs.","severity":"gotcha","affected_versions":"Older versions"},{"fix":"Ensure an AWS region is configured. This can be achieved by setting the `AWS_REGION` or `AWS_DEFAULT_REGION` environment variables (e.g., `export AWS_REGION=us-east-1`), configuring it in `~/.aws/config` or `~/.aws/credentials`, or by explicitly passing a `region_name` to your boto3 session or relevant `awswrangler` functions (e.g., `wr.s3.to_parquet(..., boto3_session=boto3.Session(region_name='us-east-1'))`).","message":"AWS SDK operations require an AWS region to be specified. The 'botocore.exceptions.NoRegionError' indicates that no region could be found through environment variables (AWS_REGION, AWS_DEFAULT_REGION), AWS config files (~/.aws/config), or explicit session configuration when awswrangler attempts to interact with AWS services like Glue or S3.","severity":"breaking","affected_versions":"All versions"},{"fix":"Ensure an AWS region is configured in your execution environment (e.g., `export AWS_REGION=us-east-1` or `aws configure`) or pass a boto3 session with a specified region to awswrangler functions.","message":"AWS SDK for pandas (awswrangler) requires an AWS region to be specified for AWS API calls. If not explicitly provided in code (e.g., through a `boto3_session` with a configured region), the region must be set in the environment (e.g., `AWS_REGION` environment variable) or via AWS configuration files (e.g., `~/.aws/config`).","severity":"breaking","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-05-12T14:02:12.784Z","next_check":"2026-06-26T00:00:00.000Z","problems":[{"fix":"Ensure that both pandas and awswrangler are updated to compatible versions. You can update them using pip: 'pip install --upgrade pandas awswrangler'.","cause":"This error occurs when the 'infer_compression' function is not found in the 'pandas.io.common' module, possibly due to version incompatibility between pandas and awswrangler.","error":"ImportError: cannot import name 'infer_compression'"},{"fix":"Include awswrangler in your Lambda deployment package by creating a Lambda layer with awswrangler installed, and attach it to your Lambda function.","cause":"This error indicates that the awswrangler module is not found in the AWS Lambda environment, likely because it wasn't included in the deployment package.","error":"Runtime.ImportModuleError: Unable to import module 'index': No module named 'awswrangler'"},{"fix":"Specify the AWS region in your boto3 session by using 'boto3.setup_default_session(region_name=\"your-region\")' before calling awswrangler functions.","cause":"This error occurs when the AWS region is not specified in the boto3 session, which is required for awswrangler operations.","error":"botocore.exceptions.NoRegionError: You must specify a region."},{"fix":"Ensure that the specified S3 path is correct and that the files exist. You can also handle this exception in your code to manage cases where files might be missing.","cause":"This error occurs when awswrangler's 'read_parquet' function cannot find any files at the specified S3 path.","error":"NoFilesFound: No files Found on S3 path: s3://example_bucket/data/parquet_files/y=2021/m=4/d=13/h=170/"},{"fix":"Ensure that both NumPy and awswrangler are updated to compatible versions. You can update them using pip: 'pip install --upgrade numpy awswrangler'.","cause":"This error occurs when there is a version incompatibility between NumPy and awswrangler, leading to missing attributes.","error":"AttributeError: _ARRAY_API not found"}],"ecosystem":"pypi","meta_description":null,"install_score":95,"install_tag":"verified","quickstart_score":0,"quickstart_tag":"stale","pypi_latest":null,"install_checks":{"last_tested":"2026-05-12","tag":"verified","tag_description":"installs cleanly on critical runtimes, fast import, recently tested","results":[{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"redshift,mysql,postgresql","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":2.52,"mem_mb":55.8,"disk_size":"388.5M"},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":2.04,"mem_mb":49.6,"disk_size":"370.0M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"redshift,mysql,postgresql","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":1.93,"mem_mb":55.8,"disk_size":"363M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":1.55,"mem_mb":49.6,"disk_size":"344M"},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"redshift,mysql,postgresql","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":3.3,"mem_mb":62.8,"disk_size":"414.5M"},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":2.53,"mem_mb":55.7,"disk_size":"392.4M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"redshift,mysql,postgresql","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":2.69,"mem_mb":62.8,"disk_size":"382M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":2.13,"mem_mb":55.7,"disk_size":"360M"},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"redshift,mysql,postgresql","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":2.88,"mem_mb":61.2,"disk_size":"407.8M"},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":2.33,"mem_mb":54.2,"disk_size":"385.9M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"redshift,mysql,postgresql","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":2.89,"mem_mb":61.2,"disk_size":"375M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":2.35,"mem_mb":54.2,"disk_size":"354M"},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"redshift,mysql,postgresql","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":2.72,"mem_mb":61.1,"disk_size":"406.8M"},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":2.18,"mem_mb":53.9,"disk_size":"384.9M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"redshift,mysql,postgresql","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":2.77,"mem_mb":61.1,"disk_size":"374M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":2.26,"mem_mb":53.9,"disk_size":"352M"},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"redshift,mysql,postgresql","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":2.38,"mem_mb":56.1,"disk_size":"382.8M"},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":1.86,"mem_mb":50.2,"disk_size":"364.5M"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"redshift,mysql,postgresql","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":2.17,"mem_mb":56.1,"disk_size":"360M"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":1.75,"mem_mb":50.1,"disk_size":"342M"}]},"quickstart_checks":{"last_tested":"2026-04-23","tag":"stale","tag_description":"widespread failures or data too old to trust","results":[{"runtime":"python:3.10-alpine","exit_code":1},{"runtime":"python:3.10-slim","exit_code":1},{"runtime":"python:3.11-alpine","exit_code":1},{"runtime":"python:3.11-slim","exit_code":1},{"runtime":"python:3.12-alpine","exit_code":1},{"runtime":"python:3.12-slim","exit_code":1},{"runtime":"python:3.13-alpine","exit_code":1},{"runtime":"python:3.13-slim","exit_code":1},{"runtime":"python:3.9-alpine","exit_code":1},{"runtime":"python:3.9-slim","exit_code":1}]}}