dbt-extractor
dbt-extractor is a Python library that processes Jinja templates within dbt model files to analyze and extract metadata such as `ref`, `source`, and `config` calls. It is part of the `dbt-labs/dbt-parser-generator` repository. The tool, currently at version 0.6.0, prioritizes 100% certainty in its extraction, raising an exception if it cannot confidently extract values, rather than risking incorrect or incomplete output.
Warnings
- gotcha Installation of `dbt-extractor` requires a Rust toolchain (specifically `cargo`) to compile its underlying components. Users without Rust installed will encounter build errors during `pip install`.
- gotcha The library's core strategy is to be 100% certain about its extractions. If it encounters Jinja it cannot confidently parse and extract, it will raise an `ExtractionError` instead of returning potentially incomplete or incorrect results. This means some valid dbt Jinja might not be processed by `dbt-extractor` and may require alternative rendering.
- gotcha `dbt-extractor` focuses on Jinja syntax extraction and does not perform validation of the underlying SQL syntax, schema existence, or data types. Errors in these areas will not be caught by `dbt-extractor` during compilation and will only manifest at runtime when dbt executes the SQL against your data warehouse.
- bug There is an open bug where installation fails on free-threaded Python 3.14t, reporting an `ImportError: DLL load failed while importing dbt_extractor`.
Install
-
pip install dbt-extractor
Imports
- extract_from_source
from dbt_extractor.main import extract_from_source
- ExtractionError
from dbt_extractor.extractor import ExtractionError
Quickstart
from dbt_extractor.main import extract_from_source
from dbt_extractor.extractor import ExtractionError
dbt_model_content = """
SELECT
{{ ref('my_model') }} as model_data,
{{ source('my_schema', 'my_table') }} as source_data,
{{ config(materialized='table') }}
FROM some_table
"""
try:
extracted_data = extract_from_source(dbt_model_content)
print("Extracted Refs:", extracted_data.refs)
print("Extracted Sources:", extracted_data.sources)
print("Extracted Configs:", extracted_data.configs)
except ExtractionError as e:
print(f"Extraction failed: {e}")