pandas-redshift

raw JSON →
2.0.5 verified Mon Apr 27 auth: no python

A library to load data from Amazon Redshift into a pandas DataFrame and write DataFrames back to Redshift. Version 2.0.5 is the latest; release cadence is sporadic. It uses SQLAlchemy under the hood for connection management.

pip install pandas-redshift
error ModuleNotFoundError: No module named 'pandas_redshift'
cause Installed the deprecated package 'redshift-pandas' (or 'redshift') instead of 'pandas-redshift'.
fix
Run 'pip install pandas-redshift'.
error AttributeError: module 'pandas_redshift' has no attribute 'read_redshift'
cause Possibly installed a very old version (pre-2.0) or used a different package name.
fix
Upgrade to latest version: 'pip install --upgrade pandas-redshift'
error psycopg2.OperationalError: could not connect to server
cause Missing or incorrect Redshift connection parameters (host, port, dbname, user, password).
fix
Ensure the Redshift cluster is accessible and the connection dictionary includes all required keys.
breaking In version 2.0.0, the package was renamed from 'redshift-pandas' to 'pandas-redshift'. Existing imports of 'redshift_pandas' will break.
fix Change import from 'import redshift_pandas' to 'import pandas_redshift'.
gotcha The default behavior for df_to_redshift uses S3 staging; if you don't provide s3_bucket and iam_role, the function will fail silently or raise a confusing error.
fix Always specify s3_bucket and iam_role (or access_key/secret_key) when writing to Redshift.
deprecated The old 'redshift' module (import redshift) was renamed and deprecated. Support for the old API may be removed in a future release.
fix Use 'import pandas_redshift' instead of 'import redshift'.

Basic example: read query results into DataFrame, then write DataFrame to a new table.

import pandas as pd
import pandas_redshift as pr

# Set Redshift connection parameters
conn_params = {
    'host': 'mycluster.redshift.amazonaws.com',
    'port': 5439,
    'database': 'mydb',
    'user': 'myuser',
    'password': 'mypassword'
}

# Read from Redshift
query = "SELECT * FROM my_table LIMIT 100"
df = pr.read_redshift(query, conn_params=conn_params)
print(df.head())

# Write DataFrame to Redshift (uses csv upload to S3)
pr.df_to_redshift(df, 'my_table', conn_params=conn_params, s3_bucket='my-bucket', iam_role='arn:aws:iam::...')