Elementary Data
Elementary Data is an open-source Python library and CLI tool that provides dbt-native data observability, including data monitoring, lineage, and data quality checks. It works in conjunction with the Elementary dbt package to collect metadata and generate comprehensive reports and alerts from your data warehouse. Currently at version 0.23.1, the library maintains an active release cadence with frequent updates and improvements.
Warnings
- breaking The `elementary-data` Python CLI works in conjunction with the `elementary-data/elementary` dbt package. Frequent updates to the dbt package can introduce incompatibilities if the CLI and dbt package versions are not aligned. Always refer to the official documentation for recommended version pairings.
- deprecated In version 0.23.0, `datetime.utcnow()` was replaced with `datetime.now(tz=timezone.utc)` for compliance with Python's best practices regarding timezone-aware datetimes. Code directly interacting with Elementary's internal datetime handling might need updating.
- gotcha While v0.19.2 addressed a specific backwards compatibility issue with Pydantic, broader Pydantic major version transitions (e.g., v1 to v2) can still cause issues if other libraries in your environment have conflicting Pydantic versions. This can lead to unexpected runtime errors.
- gotcha Elementary updated its internal dbt usage in v0.20.0 to remove the deprecated `-m` flag. Users running older versions of `elementary-data` with newer `dbt` versions might experience unexpected behavior or errors related to dbt command execution.
- gotcha The `edr` CLI requires a correctly configured `profiles.yml` (following dbt's format) with a profile specifically named `elementary` to connect to your data warehouse. Misconfiguration of this profile is a common cause of connection issues.
- gotcha Prior to v0.19.4, `float("inf")` (infinity) and `float("nan")` (not a number) values in your data could cause JSON serialization failures when `elementary-data` attempted to generate reports, leading to corrupted or ungeneratable output.
- gotcha The `elementary_sdk` Python package (distinct from `elementary-data` CLI) is primarily for programmatically sending data quality information to Elementary Cloud and testing Python pipelines. It's not intended for generating the self-hosted reports that the `edr` CLI provides. Confusing their purposes or expecting CLI features from the SDK (or vice-versa) can lead to integration challenges.
Install
-
pip install elementary-data -
pip install 'elementary-data[all]' -
pip install 'elementary-data[snowflake]' # Or bigquery, redshift, databricks, etc.
Quickstart
# 1. Install the Elementary dbt package (in your dbt project's packages.yml): # packages: # - package: elementary-data/elementary # version: 0.23.0 # Use the latest version # 2. Add configuration to your dbt_project.yml (example): # models: # elementary: # +schema: "elementary" # 3. Install dbt dependencies and build Elementary models: # dbt deps # dbt run --select elementary # dbt test # 4. Install the Elementary CLI: pip install elementary-data # 5. Ensure your dbt profiles.yml is configured with an 'elementary' profile # (see Elementary docs for details, typically ~/.dbt/profiles.yml) # 6. Generate the data observability report: # The report will be saved as an HTML file in 'target/elementary_report.html' edr report --project-dir $(pwd) # Or to monitor and send alerts: # edr monitor --slack-token $SLACK_TOKEN --slack-channel '#data-alerts'