Soda Core Trino
raw JSON → 3.5.6 verified Fri May 01 auth: no python
Soda Core Trino is an extension of Soda Core that integrates with Trino/Presto databases for data quality testing. It allows users to define and run checks (e.g., missing values, duplicates, schema changes) using SodaCL. Version 3.5.6 is the latest release. Release cadence is irregular, typically following Soda Core releases.
pip install soda-core-trino Common errors
error ModuleNotFoundError: No module named 'soda' ↓
cause Missing installation of soda-core or soda-core-trino
fix
pip install soda-core-trino
error soda.core.exceptions.DataSourceError: Data source 'my_trino' not found ↓
cause Configuration YAML not added or incorrectly formatted
fix
Ensure scan.add_configuration_yaml_str is called with proper indentation and 'type: trino'.
error trino.exceptions.TrinoConnectionError: Failed to connect to ...: Connection refused ↓
cause Trino server not running or host/port incorrect
fix
Check Trino server status and verify host/port in config; ensure network access.
Warnings
breaking In Soda Core 3.x, the API changed significantly from 2.x. The Scan object replaced the old SodaServerClient. Direct imports of data source classes are deprecated. ↓
fix Use Scan object with configuration YAML or dictionary instead of importing data source classes directly.
gotcha Trino password authentication requires SSL/TLS by default; if your cluster doesn't use SSL, set 'verify' to False in the config. ↓
fix Add 'verify: false' under the data source configuration or set environment variable SODA_VERIFY=false.
gotcha The Trino host and port must be reachable from the environment where Soda runs; connection errors often stem from network/firewall issues. ↓
fix Verify network connectivity with a simple trino client test before using Soda.
Imports
- TrinoDataSource wrong
from soda_core_trino import TrinoDataSourcecorrectfrom soda.scan import Scan; # then configure data_source with type: trino
Quickstart
from soda.scan import Scan
scan = Scan()
scan.set_scan_definition_name('test')
scan.set_data_source_name('my_trino')
scan.add_configuration_yaml_str(f'''
data_source my_trino:
type: trino
host: {os.environ.get('TRINO_HOST', 'localhost')}
port: {os.environ.get('TRINO_PORT', '8080')}
catalog: {os.environ.get('TRINO_CATALOG', 'tpch')}
schema: {os.environ.get('TRINO_SCHEMA', 'sf1')}
username: {os.environ.get('TRINO_USER', 'admin')}
password: {os.environ.get('TRINO_PASSWORD', '')}
''')
scan.add_sodacl_yaml_str('''
checks for orders:
- row_count > 0
''')
scan.execute()
print(scan.get_logs_text())