Soda Core SQL Server
Soda Core SQL Server is a plugin for Soda Core, a data quality monitoring framework. It enables connecting to and scanning data quality checks against SQL Server databases. This library provides the necessary data source integration, allowing users to define checks on their SQL Server data using Soda's YAML-based configuration. Version 3.5.6 is current, and releases typically align with the active development cadence of the main `soda-core` library.
Common errors
-
ModuleNotFoundError: No module named 'soda.scan'
cause `soda-core` (or `soda-core-sqlserver`) is not installed or not in the Python path for the active environment.fixEnsure `soda-core-sqlserver` is installed in your active Python environment: `pip install soda-core-sqlserver`. This command will also install `soda-core` as a dependency. -
pyodbc.Error: ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'ODBC Driver 17 for SQL Server' : file not found (0) (SQLDriverConnect)")cause The required ODBC driver for SQL Server is not installed or not correctly configured on the system where Soda Core is running, even if `pyodbc` itself is installed.fixInstall the appropriate Microsoft ODBC Driver for SQL Server for your operating system. For example, on Ubuntu: `sudo apt-get update && sudo apt-get install msodbcsql17 unixodbc-dev`. On macOS: `brew tap microsoft/mssql-release https://github.com/Microsoft/homebrew-mssql-release && brew install msodbcsql17 mssql-tools`. -
Scan failed: No data source was found for data_source: [your_data_source_name] in the configuration file.
cause The data source name specified in the `scan.set_data_source_name()` call (or CLI command) does not match the top-level key used to define the data source in `configuration.yml`.fixVerify that the `data_source` key in your `configuration.yml` file exactly matches the name you provide to the `Scan` object or CLI command. E.g., if `configuration.yml` has `my_sql_server_ds:`, then use `scan.set_data_source_name('my_sql_server_ds')`. -
ERROR: [SODA] Could not connect to data source 'data_source'. Please check the connection details.
cause Authentication or connection details (host, port, database, username, password) are incorrect, or the SQL Server instance is unreachable/misconfigured.fixDouble-check all connection parameters in your `configuration.yml`. Ensure the SQL Server host is accessible from where Soda Core is running (check firewalls and network connectivity). Verify that the provided credentials are valid and have necessary permissions to access the database.
Warnings
- breaking Soda Core v2.x and v3.x introduced significant changes in the configuration file format and CLI commands. `soda-core-sqlserver` versions are compatible with `soda-core` v3.x. Upgrading from v2.x requires updating your `configuration.yml` and `checks.yml` files.
- gotcha While `soda-core-sqlserver` installs `pyodbc`, you still need an appropriate ODBC driver for SQL Server installed on your operating system (e.g., 'ODBC Driver 17 for SQL Server'). Without it, `pyodbc` cannot connect to the database.
- gotcha SQL Server connection strings can be complex. Incorrect host, port, database, username, password, or driver options will lead to connection failures. Often, specific `connection_string_user_defined_options` are needed for trusted connections or custom setups (e.g., `Encrypt=no;TrustServerCertificate=yes;`).
Install
-
pip install soda-core-sqlserver
Imports
- SQL Server Data Source Type (via configuration)
This package enables `type: sqlserver` in your `configuration.yml`. For programmatic scan execution: `from soda.scan import Scan`
Quickstart
import os
from soda.scan import Scan
# Ensure environment variables are set for the quickstart to run
# For local testing, replace os.environ.get with actual values or create a .env file
host = os.environ.get('SQLSERVER_HOST', 'localhost')
port = os.environ.get('SQLSERVER_PORT', '1433')
database = os.environ.get('SQLSERVER_DATABASE', 'your_database')
username = os.environ.get('SQLSERVER_USERNAME', 'sa')
password = os.environ.get('SQLSERVER_PASSWORD', 'your_password')
# Create a dummy configuration.yml and checks.yml for demonstration
# In a real scenario, these files would be persisted.
config_yaml_content = f"""
data_source:
type: sqlserver
host: "{host}"
port: "{port}"
database: "{database}"
username: "{username}"
password: "{password}"
# Add other connection options if needed, e.g., for trusted connections or specific drivers
# Example using system-installed ODBC Driver 17 for SQL Server on Linux/macOS:
# connection_string_user_defined_options: "Driver={{ODBC Driver 17 for SQL Server}};Encrypt=no;TrustServerCertificate=yes;"
# Example for Windows Trusted Connection (if your SQL Server supports it):
# connection_string_user_defined_options: "Trusted_Connection=Yes;Encrypt=no;"
"""
checks_yaml_content = """
checks for demo_table:
- row_count > 0
- missing_count(id) = 0
"""
config_path = 'configuration.yml'
checks_path = 'checks.yml'
with open(config_path, 'w') as f:
f.write(config_yaml_content)
with open(checks_path, 'w') as f:
f.write(checks_yaml_content)
print("Configuration and checks files created. Running Soda Scan...")
scan = Scan()
scan.set_data_source_name('data_source') # Corresponds to the top-level key in configuration.yml
scan.add_configuration_path(config_path)
scan.add_checks_path(checks_path)
scan.execute()
if scan.has_failures():
print("Scan completed with failures.")
else:
print("Scan completed successfully without failures.")
# Clean up generated files (optional, remove in persistent setups)
os.remove(config_path)
os.remove(checks_path)