{"id":5488,"library":"soda-core-duckdb","title":"Soda Core DuckDB Connector","description":"soda-core-duckdb is a Python connector that enables Soda Core, an open-source data quality and data contract verification engine, to connect and run data quality checks against DuckDB databases. It facilitates defining data quality expectations in YAML (SodaCL) and executing scans programmatically or via CLI to validate data. The library is actively maintained as part of the broader Soda Core ecosystem, which sees frequent updates and new feature releases.","status":"active","version":"3.5.6","language":"en","source_language":"en","source_url":"https://github.com/sodadata/soda-core","tags":["data quality","data contracts","duckdb","etl","observability","data validation"],"install":[{"cmd":"pip install soda-core-duckdb","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Core library for data quality checks and scan execution.","package":"soda-core"},{"reason":"Database engine connector for DuckDB instances.","package":"duckdb"}],"imports":[{"note":"Programmatic interaction with Soda Core for DuckDB is typically done through the central `Scan` class from `soda.scan`, not directly from the connector package.","wrong":"from soda_core_duckdb import ...","symbol":"Scan","correct":"from soda.scan import Scan"}],"quickstart":{"code":"import os\nimport duckdb\nfrom soda.scan import Scan\n\n# 1. Create a dummy DuckDB database and a table\ncon = duckdb.connect(database=':memory:', read_only=False)\ncon.execute(\"CREATE TABLE my_table (id INTEGER, name VARCHAR);\")\ncon.execute(\"INSERT INTO my_table VALUES (1, 'Alice'), (2, 'Bob'), (3, NULL);\")\n\n# 2. Define a data source configuration (optional for in-memory, but good practice)\n# This would typically be in a configuration.yml file\n# ds_config_content = \"\"\"\n# data_source my_duckdb:\n#   type: duckdb\n#   connection:\n#     database: ':memory:'\n# \"\"\"\n\n# 3. Define SodaCL checks in a checks.yml file\nchecks_content = \"\"\"\nchecks for my_table:\n  - row_count > 0\n  - missing_count(name) = 1\n  - column_count = 2\n\"\"\"\n\nwith open('checks.yml', 'w') as f:\n    f.write(checks_content)\n\n# 4. Programmatically run a Soda scan\nscan = Scan()\nscan.add_duckdb_connection(con)\nscan.set_data_source_name('my_duckdb_source') # Logical name for the data source\nscan.add_sodacl_yaml_files(file_paths=['checks.yml'])\n\nprint('Running Soda scan...')\nscan.execute()\n\nif scan.has_failures():\n    print('Scan failed!')\n    # Optionally, you can assert or raise an error\n    # scan.assert_no_checks_fail()\nelse:\n    print('Scan successful: all checks passed or warned.')\n\nprint(scan.get_logs_text())\n\n# Clean up temporary files\nos.remove('checks.yml')\ncon.close()\n","lang":"python","description":"This quickstart demonstrates how to set up an in-memory DuckDB database, define data quality checks using SodaCL in a `checks.yml` file, and then execute a programmatic scan using the `soda.scan.Scan` class to validate the data. It checks for a positive row count, a specific number of missing values in a column, and the total column count."},"warnings":[{"fix":"Refer to Soda Core's official migration guides when upgrading to v4 to convert existing `checks.yml` files to the new `contract.yml` format. For `soda-core-duckdb 3.5.6`, continue using the checks language.","message":"Soda Core v4 introduced a breaking change, moving from a 'checks language' (used in `checks.yml`) to a 'Data Contracts-based syntax' (`contract.yml`). Users upgrading from Soda Core v3.x to v4.x will need to migrate their data quality definitions.","severity":"breaking","affected_versions":"soda-core < 4.0.0 (including 3.x)"},{"fix":"If `database: 'your_file.duckdb'` causes issues, try `connection: path: 'your_file.duckdb'` in your data source configuration YAML.","message":"When defining DuckDB connections in `configuration.yml` for Soda Core v3, some users have reported issues when using the `database` key to specify the DuckDB file path. Using `path` instead often resolves the problem.","severity":"gotcha","affected_versions":"soda-core-duckdb 3.x"},{"fix":"Always check the `soda-core` and `soda-core-duckdb` release notes or PyPI `requires` section for exact `duckdb` version compatibility. Pin your `duckdb` dependency if encountering conflicts.","message":"Specific versions of `soda-core` (and by extension `soda-core-duckdb`) might have strict `duckdb` version requirements. For example, `soda-core` v3.5.0 relaxed its `duckdb` dependency to `<1.1.0`.","severity":"gotcha","affected_versions":"soda-core-duckdb 3.x, especially around 3.5.0"},{"fix":"This issue is often resolved by updating `protobuf` or related dependencies. If the issue persists, consult Soda Core's troubleshooting documentation for specific dependency pinning or exclusion advice.","message":"An `ImportError: dlopen(.../site-packages/google/protobuf/pyext/_message.cpython-310-darwin.so, 0x0002): symbol not found in flat namespace` can occur due to a transitive dependency from `opentelemetry` that gathers OSS usage statistics in Soda Core 3.x.","severity":"gotcha","affected_versions":"soda-core 3.x, notably 3.0.9"}],"env_vars":null,"last_verified":"2026-04-13T00:00:00.000Z","next_check":"2026-07-12T00:00:00.000Z"}