{"id":10241,"library":"soda-core-redshift","title":"Soda Core Redshift","description":"Soda Core Redshift is a plugin for Soda Core that enables data quality checks against Amazon Redshift data warehouses. It provides the necessary connector to allow Soda Core to interact with Redshift, fetch metadata, and execute SQL queries for data quality monitoring. The current version is 3.5.6, and it typically follows the release cycle of the main `soda-core` library, with frequent updates.","status":"active","version":"3.5.6","language":"en","source_language":"en","source_url":"https://github.com/sodadata/soda-core-redshift","tags":["data quality","redshift","aws","data governance","etl"],"install":[{"cmd":"pip install soda-core-redshift","lang":"bash","label":"Install Soda Core Redshift"}],"dependencies":[{"reason":"Soda Core Redshift is a driver for the main Soda Core library.","package":"soda-core","optional":false},{"reason":"Required for connecting to Redshift; typically installed as a dependency of soda-core-redshift.","package":"psycopg2-binary","optional":false}],"imports":[{"note":"Soda Core Redshift is a driver and does not expose direct Python symbols for user import. Its functionality is enabled when `soda.scan.Scan` is configured to connect to a Redshift data source, making it an implicit dependency for the `Scan` class.","symbol":"Scan","correct":"from soda.scan import Scan"}],"quickstart":{"code":"import os\nfrom soda.scan import Scan\n\n# Configure Redshift connection details using environment variables for security\nredshift_host = os.environ.get('REDSHIFT_HOST', 'your_redshift_host.com')\nredshift_port = os.environ.get('REDSHIFT_PORT', '5439')\nredshift_database = os.environ.get('REDSHIFT_DATABASE', 'your_db_name')\nredshift_username = os.environ.get('REDSHIFT_USERNAME', 'your_username')\nredshift_password = os.environ.get('REDSHIFT_PASSWORD', 'your_password')\n\n# Define the Soda Core configuration as a string\nconfiguration_yaml = f'''\ndata_source redshift:\n  type: redshift\n  host: {redshift_host}\n  port: {redshift_port}\n  database: {redshift_database}\n  username: {redshift_username}\n  password: {redshift_password}\n'''\n\n# Define data quality checks as a string\nchecks_yaml = '''\nchecks for dim_users:\n  - row_count > 0\n  - duplicate_count(user_id) = 0\n  - missing_count(email) = 0\n'''\n\n# Initialize and execute the Soda Scan\nscan = Scan()\nscan.set_data_source_name('redshift')\nscan.add_configuration_yaml_str(configuration_yaml)\nscan.add_checks_yaml_str(checks_yaml)\n\nprint(\"Running Soda Scan...\")\nscan.execute()\n\nif scan.has_failures():\n    print(\"Scan finished with failures.\")\n    exit(1)\nelif scan.has_warnings():\n    print(\"Scan finished with warnings.\")\nelif scan.has_errors():\n    print(\"Scan finished with errors.\")\n    exit(1)\nelse:\n    print(\"Scan finished successfully.\")\n","lang":"python","description":"This quickstart demonstrates how to run a Soda Core scan against a Redshift data source using Python. It configures the Redshift connection details, defines a simple check for a 'dim_users' table, and executes the scan. Connection details are pulled from environment variables for secure credential management."},"warnings":[{"fix":"Always install `soda-core-redshift` and `soda-core` at compatible versions, ideally by installing `soda-core-redshift` which pulls in the correct `soda-core` version, or by keeping both up-to-date.","message":"Ensure `soda-core` and `soda-core-redshift` versions are compatible. While minor version mismatches often work, major version mismatches can lead to unexpected behavior or errors.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always use environment variables (e.g., `os.environ.get()` in Python) or a secrets manager to pass credentials to Soda Core, especially in production environments.","message":"Redshift connection details (host, port, database, username, password) are sensitive. Hardcoding them in configuration files is a security risk.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Verify that your Redshift cluster is accessible from where Soda Core is running (e.g., check security groups, network ACLs). Double-check host, port, database name, username, and password. Ensure the user has appropriate permissions on the database and tables being scanned.","message":"Connectivity issues to Redshift are often due to network firewalls, security groups, or incorrect database credentials/permissions.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-17T00:00:00.000Z","next_check":"2026-07-16T00:00:00.000Z","problems":[{"fix":"Ensure `psycopg2-binary` is installed: `pip install psycopg2-binary`. If using a specific `psycopg2` version, ensure it is built with necessary system libraries for your OS (e.g., `libpq-dev` on Debian/Ubuntu, `postgresql-devel` on CentOS/RHEL).","cause":"The `psycopg2-binary` package, which provides the PostgreSQL adapter needed for Redshift, is either not installed or its dependencies are missing.","error":"ModuleNotFoundError: No module named 'psycopg2'"},{"fix":"Verify that your `configuration.yml` contains a `data_source` entry with `type: redshift` and that the name matches the one used in `scan.set_data_source_name('redshift')`.","cause":"The `data_source` section in your `configuration.yml` (or configuration string) does not correctly define a data source named 'redshift', or the `type` is misspelled, or the `set_data_source_name()` in Python does not match.","error":"SodaException: Data source 'redshift' not found in configuration"},{"fix":"Double-check the `username` and `password` used in your configuration against your Redshift cluster's user credentials. Ensure there are no typos or leading/trailing spaces. Confirm the user exists and has permission to connect.","cause":"Incorrect username or password provided for the Redshift connection.","error":"FATAL: password authentication failed for user \"your_username\""},{"fix":"Verify the `host` value is correct and fully qualified. Check your network configuration and DNS settings. Ensure there are no firewalls or security groups blocking outbound connections to Redshift from where Soda Core is running.","cause":"The provided Redshift host name cannot be resolved by DNS, or there's a network issue preventing access.","error":"Error: Could not connect to Redshift database: could not translate host name \"your_redshift_host.com\" to address: Name or service not known"}]}