{"library":"soda-core-bigquery","title":"Soda Core BigQuery","type":"library","description":"Soda Core BigQuery is an extension for Soda Core, an open-source data quality testing tool. It enables users to define, execute, and monitor data quality checks directly against data stored in Google BigQuery. This package provides the necessary connector and SQL dialect definitions to interact with BigQuery, allowing for comprehensive data quality assessments within a Python environment, typically managed through the Soda CLI and YAML configuration files. The current version is 3.5.6, and it follows the release cadence of the broader Soda Core project.","language":"python","status":"active","last_verified":"Fri Apr 17","install":{"commands":["pip install soda-core-bigquery"],"cli":{"name":"soda","version":"soda-core, version 3.5.6"}},"imports":["from soda.scan import Scan"],"auth":{"required":false,"env_vars":[]},"links":{"homepage":"https://www.soda.io","github":null,"docs":null,"changelog":null,"pypi":"https://pypi.org/project/soda-core-bigquery/","npm":null,"openapi_spec":null,"status_page":null,"smithery":null},"quickstart":{"code":"import os\nfrom soda.scan import Scan\n\n# Configure your GCP project ID. For local execution, ensure:\n# 1. 'GOOGLE_APPLICATION_CREDENTIALS' env var points to a service account key JSON,\n#    OR 2. 'gcloud auth application-default login' has been run.\n# os.environ['BIGQUERY_PROJECT_ID'] = 'your-gcp-project-id'\n\n# Initialize a Soda Scan\nscan = Scan()\n\n# Add BigQuery data source configuration via a YAML string\n# Replace 'my-dummy-project' with your actual GCP project ID\nscan.add_configuration_yaml_str(f'''\ndata_sources:\n  bigquery_source:\n    type: bigquery\n    project_id: {os.environ.get('BIGQUERY_PROJECT_ID', 'my-dummy-project')} \n''')\n\n# Define data quality checks via a YAML string\n# Replace 'my_dataset.my_table' with an actual BigQuery dataset.table for real checks\nscan.add_checks_yaml_str('''\nchecks for my_dataset.my_table:\n  - row_count > 0: # Checks if the table has any rows\n      name: 'Table should not be empty'\n  - missing_count(id) = 0: # Checks for missing values in 'id' column\n      name: 'No missing IDs'\n  - duplicate_count(id) = 0: # Checks for duplicate values in 'id' column\n      name: 'No duplicate IDs'\n''')\n\nprint(\"Running Soda Scan...\")\nscan.execute()\n\nif scan.has_failures():\n    print(\"\\nSoda Scan finished with failures.\")\nelse:\n    print(\"\\nSoda Scan finished with no failures.\")\n\n# You can inspect the scan results for detailed outcomes\n# print(scan.get_scan_results())","lang":"python","description":"This quickstart demonstrates how to run a Soda Scan programmatically against a BigQuery data source. It configures the BigQuery connection and defines simple data quality checks. For this to run successfully against your data, ensure BigQuery authentication is set up (e.g., `GOOGLE_APPLICATION_CREDENTIALS` environment variable or `gcloud auth application-default login`) and replace placeholder values for `project_id`, `dataset`, and `table` with your actual BigQuery resources.","tag":null,"tag_description":null,"last_tested":null,"results":[]},"compatibility":null}