{"id":10240,"library":"soda-core-bigquery","title":"Soda Core BigQuery","description":"Soda Core BigQuery is an extension for Soda Core, an open-source data quality testing tool. It enables users to define, execute, and monitor data quality checks directly against data stored in Google BigQuery. This package provides the necessary connector and SQL dialect definitions to interact with BigQuery, allowing for comprehensive data quality assessments within a Python environment, typically managed through the Soda CLI and YAML configuration files. The current version is 3.5.6, and it follows the release cadence of the broader Soda Core project.","status":"active","version":"3.5.6","language":"en","source_language":"en","source_url":"https://github.com/sodadata/soda-core","tags":["data quality","bigquery","data governance","testing","analytics","sql","google cloud"],"install":[{"cmd":"pip install soda-core-bigquery","lang":"bash","label":"Install `soda-core-bigquery`"}],"dependencies":[{"reason":"Provides the core Soda CLI, scan execution engine, and base API functionalities; `soda-core-bigquery` is an extension.","package":"soda-core"}],"imports":[{"note":"While `soda-core-bigquery` is primarily used via CLI and YAML, programmatic interaction with Soda Core's scan capabilities is done through the `Scan` object.","symbol":"Scan","correct":"from soda.scan import Scan"}],"quickstart":{"code":"import os\nfrom soda.scan import Scan\n\n# Configure your GCP project ID. For local execution, ensure:\n# 1. 'GOOGLE_APPLICATION_CREDENTIALS' env var points to a service account key JSON,\n#    OR 2. 'gcloud auth application-default login' has been run.\n# os.environ['BIGQUERY_PROJECT_ID'] = 'your-gcp-project-id'\n\n# Initialize a Soda Scan\nscan = Scan()\n\n# Add BigQuery data source configuration via a YAML string\n# Replace 'my-dummy-project' with your actual GCP project ID\nscan.add_configuration_yaml_str(f'''\ndata_sources:\n  bigquery_source:\n    type: bigquery\n    project_id: {os.environ.get('BIGQUERY_PROJECT_ID', 'my-dummy-project')} \n''')\n\n# Define data quality checks via a YAML string\n# Replace 'my_dataset.my_table' with an actual BigQuery dataset.table for real checks\nscan.add_checks_yaml_str('''\nchecks for my_dataset.my_table:\n  - row_count > 0: # Checks if the table has any rows\n      name: 'Table should not be empty'\n  - missing_count(id) = 0: # Checks for missing values in 'id' column\n      name: 'No missing IDs'\n  - duplicate_count(id) = 0: # Checks for duplicate values in 'id' column\n      name: 'No duplicate IDs'\n''')\n\nprint(\"Running Soda Scan...\")\nscan.execute()\n\nif scan.has_failures():\n    print(\"\\nSoda Scan finished with failures.\")\nelse:\n    print(\"\\nSoda Scan finished with no failures.\")\n\n# You can inspect the scan results for detailed outcomes\n# print(scan.get_scan_results())","lang":"python","description":"This quickstart demonstrates how to run a Soda Scan programmatically against a BigQuery data source. It configures the BigQuery connection and defines simple data quality checks. For this to run successfully against your data, ensure BigQuery authentication is set up (e.g., `GOOGLE_APPLICATION_CREDENTIALS` environment variable or `gcloud auth application-default login`) and replace placeholder values for `project_id`, `dataset`, and `table` with your actual BigQuery resources."},"warnings":[{"fix":"Refer to the official Soda Core 3.x migration guide for CLI and YAML syntax updates. Always test configurations in a development environment before deploying to production.","message":"Soda Core 3.x introduced significant changes to the CLI commands (e.g., `soda scan` replaced `soda analyze`) and the structure of configuration YAML files compared to 2.x versions. Migrating from older versions requires updating commands and YAML definitions.","severity":"breaking","affected_versions":"2.x to 3.x"},{"fix":"Ensure the service account or user account has at least `BigQuery Data Viewer` (to read data) and `BigQuery Job User` (to run queries) roles. For service accounts, verify `GOOGLE_APPLICATION_CREDENTIALS` points to a valid key JSON file. For user credentials, ensure `gcloud auth application-default login` has been executed.","message":"Proper BigQuery authentication and IAM roles are critical. Common issues include incorrect service account keys, missing `GOOGLE_APPLICATION_CREDENTIALS` environment variable, or insufficient IAM permissions for the service account/user.","severity":"gotcha","affected_versions":"All"},{"fix":"If encountering issues, explicitly check `pip show soda-core` and `pip show soda-core-bigquery` to confirm both are installed and compatible. Rarely, direct `pip install soda-core` might be needed if `soda-core-bigquery`'s dependency resolution fails.","message":"`soda-core-bigquery` is an extension package. While `pip install soda-core-bigquery` typically pulls `soda-core` as a dependency, the core functionalities (CLI, `Scan` object) are provided by `soda-core`. Ensure `soda-core` is available and compatible.","severity":"gotcha","affected_versions":"All"},{"fix":"Use a YAML linter or validator. Carefully review the Soda Core documentation for the exact YAML structure and examples. Pay close attention to spacing and nesting, especially for data source definitions and check configurations.","message":"YAML configuration files (`configuration.yml`, `checks.yml`) are highly sensitive to indentation and syntax. Minor errors can lead to failures, unexecuted checks, or incorrect interpretations without clear error messages.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-17T00:00:00.000Z","next_check":"2026-07-16T00:00:00.000Z","problems":[{"fix":"Ensure `soda-core` is installed in your environment. Running `pip install soda-core-bigquery` should install `soda-core` as a dependency, but if not, try `pip install soda-core` directly.","cause":"The base `soda-core` package, which provides the core `Scan` object and CLI, is either not installed or is an incompatible version.","error":"ModuleNotFoundError: No module named 'soda.scan'"},{"fix":"Set the `GOOGLE_APPLICATION_CREDENTIALS` environment variable to the path of your service account key JSON file, or authenticate your user by running `gcloud auth application-default login`.","cause":"Soda Core cannot find valid Google Cloud authentication credentials in the execution environment or specified configuration.","error":"google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials."},{"fix":"Double-check the project ID, dataset ID, and table name for typos. Verify that the BigQuery credentials have `BigQuery Data Viewer` role for the data and `BigQuery Job User` for running queries on the target project/dataset.","cause":"The specified BigQuery project, dataset, or table in your `configuration.yml` or `checks.yml` does not exist, or the authenticated user/service account lacks permissions to access it.","error":"google.api_core.exceptions.NotFound: 404 Not found: Dataset <your-project>:<your-dataset>"},{"fix":"Review your `configuration.yml` (or `add_configuration_yaml_str` content) carefully. Ensure `type: bigquery` is correctly specified and that `project_id` (if required) is present and accurate, paying attention to YAML indentation.","cause":"There is a syntax error, missing required field, or incorrect type in your `data_sources` configuration within `configuration.yml` or the programmatic configuration string.","error":"ERROR: Data source 'your_data_source_name' in configuration is not valid."}]}