Soda Core Athena

raw JSON →
3.5.6 verified Mon Apr 27 auth: no python

Soda Core Athena is an extension of Soda Core for scanning data quality issues in AWS Athena. The current version is 3.5.6. It follows the same release cadence as Soda Core.

pip install soda-core-athena
error soda.scan.Scan: data source 'athena' not found
cause The Athena data source extension is not installed or not properly registered.
fix
Run pip install soda-core-athena and verify import with from soda.scan import Scan.
error AttributeError: module 'soda' has no attribute 'scan'
cause Using an older version of Soda Core (<3.0.0) where the scan module was not yet introduced.
fix
Upgrade to Soda Core 3.x: pip install soda-core>=3.0.0.
error soda.scan.Scan: 'AthenaScan' object has no attribute 'execute'
cause Attempting to use a deprecated class or incorrect import path.
fix
Use Scan from soda.scan and call scan.execute().
breaking In Soda Core 3.x, the scan creation API changed from `SodaScan` to `Scan`. The class `SodaScan` was removed.
fix Use `from soda.scan import Scan` and instantiate `Scan()`.
gotcha The Athena data source requires `soda-core-athena` to be installed separately, but the import is still from `soda.scan`. Missing this package will cause a data source configuration error.
fix Ensure `soda-core-athena` is installed in the same environment.
gotcha Athena scanning may fail if the AWS credentials are not properly configured. Soda uses the default AWS credential chain.
fix Set environment variables `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, and optionally `AWS_SESSION_TOKEN` before running the scan.

Basic Soda Core scan using Athena data source.

from soda.scan import Scan

scan = Scan()
scan.set_data_source_name('athena')
scan.add_configuration_yaml_file('configuration.yml')
scan.add_sodacl_yaml_file('checks.yml')
scan.execute()
print(scan.get_logs_text())