pandas-redshift
raw JSON → 2.0.5 verified Mon Apr 27 auth: no python
A library to load data from Amazon Redshift into a pandas DataFrame and write DataFrames back to Redshift. Version 2.0.5 is the latest; release cadence is sporadic. It uses SQLAlchemy under the hood for connection management.
pip install pandas-redshift Common errors
error ModuleNotFoundError: No module named 'pandas_redshift' ↓
cause Installed the deprecated package 'redshift-pandas' (or 'redshift') instead of 'pandas-redshift'.
fix
Run 'pip install pandas-redshift'.
error AttributeError: module 'pandas_redshift' has no attribute 'read_redshift' ↓
cause Possibly installed a very old version (pre-2.0) or used a different package name.
fix
Upgrade to latest version: 'pip install --upgrade pandas-redshift'
error psycopg2.OperationalError: could not connect to server ↓
cause Missing or incorrect Redshift connection parameters (host, port, dbname, user, password).
fix
Ensure the Redshift cluster is accessible and the connection dictionary includes all required keys.
Warnings
breaking In version 2.0.0, the package was renamed from 'redshift-pandas' to 'pandas-redshift'. Existing imports of 'redshift_pandas' will break. ↓
fix Change import from 'import redshift_pandas' to 'import pandas_redshift'.
gotcha The default behavior for df_to_redshift uses S3 staging; if you don't provide s3_bucket and iam_role, the function will fail silently or raise a confusing error. ↓
fix Always specify s3_bucket and iam_role (or access_key/secret_key) when writing to Redshift.
deprecated The old 'redshift' module (import redshift) was renamed and deprecated. Support for the old API may be removed in a future release. ↓
fix Use 'import pandas_redshift' instead of 'import redshift'.
Imports
- read_redshift wrong
from pandas_redshift import read_redshift as old_funccorrectfrom pandas_redshift import read_redshift - df_to_redshift
from pandas_redshift import df_to_redshift
Quickstart
import pandas as pd
import pandas_redshift as pr
# Set Redshift connection parameters
conn_params = {
'host': 'mycluster.redshift.amazonaws.com',
'port': 5439,
'database': 'mydb',
'user': 'myuser',
'password': 'mypassword'
}
# Read from Redshift
query = "SELECT * FROM my_table LIMIT 100"
df = pr.read_redshift(query, conn_params=conn_params)
print(df.head())
# Write DataFrame to Redshift (uses csv upload to S3)
pr.df_to_redshift(df, 'my_table', conn_params=conn_params, s3_bucket='my-bucket', iam_role='arn:aws:iam::...')