{"id":2291,"library":"soda-core","title":"Soda Core","description":"Soda Core is an open-source command-line tool and Python library for data quality testing, monitoring, and scanning. It allows users to define data quality checks (e.g., freshness, uniqueness, validity) and execute them against various data sources to identify issues. Version 4.3.0 is current, with frequent updates and a strong focus on community contributions, requiring Python >=3.10.","status":"active","version":"4.3.0","language":"en","source_language":"en","source_url":"https://github.com/sodadata/soda-core","tags":["data quality","data monitoring","data observability","data integrity","analytics engineering"],"install":[{"cmd":"pip install soda-core","lang":"bash","label":"Install core library"}],"dependencies":[{"reason":"Required for scanning PostgreSQL data sources.","package":"soda-core-postgres","optional":true},{"reason":"Required for scanning Google BigQuery data sources.","package":"soda-core-bigquery","optional":true},{"reason":"Required for scanning Snowflake data sources.","package":"soda-core-snowflake","optional":true},{"reason":"Used in the quickstart example to create a dummy CSV file.","package":"pandas","optional":true}],"imports":[{"note":"The top-level package for the `Scan` class is `soda`, not `soda_core`.","wrong":"from soda_core.scan import Scan","symbol":"Scan","correct":"from soda.scan import Scan"}],"quickstart":{"code":"import os\nimport pandas as pd\nfrom soda.scan import Scan\n\n# Create a dummy CSV file for the example\ncsv_filename = \"sample_data.csv\"\ndf = pd.DataFrame({\n    'id': [1, 2, 3, 4, 5],\n    'value': ['A', 'B', 'C', 'D', 'E'],\n    'status': ['active', 'inactive', 'active', 'active', None]\n})\ndf.to_csv(csv_filename, index=False)\n\n# Define the data source configuration as a string\ndata_source_config = f\"\"\"\n  data_source my_csv_source:\n    type: local_system\n    file_system:\n      type: local\n    path: {os.getcwd()}\n\"\"\"\n\n# Define SodaCL checks as a string\nsodacl_checks = f\"\"\"\n  checks for {csv_filename}:\n    - row_count > 0\n    - missing_count(status) = 1\n    - duplicate_count(id) = 0\n\"\"\"\n\n# Run the Soda Core scan programmatically\nscan = Scan()\nscan.set_verbose(True) # Optional: for more detailed output\nscan.add_configuration_yaml_str(data_source_config)\nscan.add_sodacl_yaml_str(sodacl_checks)\nscan.set_data_source_name(\"my_csv_source\") # Must match the name in data_source_config\nscan.execute_scan()\n\nprint(\"\\n--- Scan Results ---\")\nif scan.has_failures():\n    print(\"Scan completed with failures.\")\nelse:\n    print(\"Scan completed successfully.\")\n\n# Clean up the dummy CSV file\nos.remove(csv_filename)\n","lang":"python","description":"This quickstart demonstrates how to run a programmatic Soda Core scan against a local CSV file. It defines the data source and SodaCL checks as strings, executes the scan, and prints a summary of the results. If you don't have it, install `pandas` (`pip install pandas`) for the CSV creation portion of this example."},"warnings":[{"fix":"Refer to the official Soda Core 4.x documentation for updated API usage and SodaCL syntax. Specifically, review migration guides for changes in `Scan` object methods and configuration file structures (e.g., `configuration.yml` to `data_source.yml`).","message":"Major API changes occurred in Soda Core 4.0.0. The `Scan` object's methods, configuration file naming, and SodaCL syntax were revised. For example, `scan.set_scan_definition_name()` and `scan.add_sodacl_yaml_file()` from 3.x were replaced by methods like `scan.set_data_source_name()` and `scan.add_check_yaml_file()` or `add_sodacl_yaml_str()`.","severity":"breaking","affected_versions":">=4.0.0 (when migrating from <4.0.0)"},{"fix":"Install the appropriate `soda-core-<data-source>` package for your database, e.g., `pip install soda-core-postgres`.","message":"Soda Core itself does not include database drivers. You must install specific `soda-core-<data-source>` packages (e.g., `soda-core-postgres`, `soda-core-bigquery`, `soda-core-snowflake`) separately for the data sources you intend to scan. Failure to do so will result in connection errors.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always explicitly specify your configuration and check files using the `-d` and `-c` CLI arguments, or use programmatic methods like `add_configuration_yaml_str()` and `add_sodacl_yaml_str()` to ensure predictable behavior.","message":"When running `soda scan` from the CLI without explicitly specifying configuration files (e.g., `-d data_source.yml -c checks.yml`), Soda Core automatically looks for `data_source.yml` and `checks.yml` (or `configuration.yml` in older versions) in the current working directory. This can lead to unexpected scans or configuration mismatches.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}