Soda Core Redshift

3.5.6 · active · verified Fri Apr 17

Soda Core Redshift is a plugin for Soda Core that enables data quality checks against Amazon Redshift data warehouses. It provides the necessary connector to allow Soda Core to interact with Redshift, fetch metadata, and execute SQL queries for data quality monitoring. The current version is 3.5.6, and it typically follows the release cycle of the main `soda-core` library, with frequent updates.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to run a Soda Core scan against a Redshift data source using Python. It configures the Redshift connection details, defines a simple check for a 'dim_users' table, and executes the scan. Connection details are pulled from environment variables for secure credential management.

import os
from soda.scan import Scan

# Configure Redshift connection details using environment variables for security
redshift_host = os.environ.get('REDSHIFT_HOST', 'your_redshift_host.com')
redshift_port = os.environ.get('REDSHIFT_PORT', '5439')
redshift_database = os.environ.get('REDSHIFT_DATABASE', 'your_db_name')
redshift_username = os.environ.get('REDSHIFT_USERNAME', 'your_username')
redshift_password = os.environ.get('REDSHIFT_PASSWORD', 'your_password')

# Define the Soda Core configuration as a string
configuration_yaml = f'''
data_source redshift:
  type: redshift
  host: {redshift_host}
  port: {redshift_port}
  database: {redshift_database}
  username: {redshift_username}
  password: {redshift_password}
'''

# Define data quality checks as a string
checks_yaml = '''
checks for dim_users:
  - row_count > 0
  - duplicate_count(user_id) = 0
  - missing_count(email) = 0
'''

# Initialize and execute the Soda Scan
scan = Scan()
scan.set_data_source_name('redshift')
scan.add_configuration_yaml_str(configuration_yaml)
scan.add_checks_yaml_str(checks_yaml)

print("Running Soda Scan...")
scan.execute()

if scan.has_failures():
    print("Scan finished with failures.")
    exit(1)
elif scan.has_warnings():
    print("Scan finished with warnings.")
elif scan.has_errors():
    print("Scan finished with errors.")
    exit(1)
else:
    print("Scan finished successfully.")

view raw JSON →