SQL Lineage Analysis Tool

1.5.7 · active · verified Sat Apr 11

SQLLineage is a Python library designed for SQL lineage analysis, capable of identifying source and target tables, as well as providing column-level lineage from SQL queries. It leverages popular SQL parser libraries like sqlfluff and sqlparse, and uses networkx for graph representation. The library is actively maintained, with its current version being 1.5.7, and sees regular minor releases to introduce enhancements and bug fixes.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize `LineageRunner` with a SQL statement and extract the source and target tables. For accurate column-level lineage, especially with `SELECT *` or unqualified columns, providing a `MetaData` object (e.g., via `DummyMetaDataProvider` or `SQLAlchemyMetaDataProvider`) is crucial.

from sqllineage.runner import LineageRunner

sql = "INSERT INTO target_schema.target_table SELECT col1, col2 FROM source_schema.source_table WHERE col3 > 100"

runner = LineageRunner(sql)

print(f"Source Tables: {[str(t) for t in runner.source_tables]}")
print(f"Target Tables: {[str(t) for t in runner.target_tables]}")

# For column-level lineage (requires metadata for full accuracy)
# from sqllineage.core.metadata_provider import DummyMetaDataProvider
# metadata = {
#     'source_schema.source_table': ['col1', 'col2', 'col3']
# }
# runner_with_metadata = LineageRunner(sql, metadata_provider=DummyMetaDataProvider(metadata))
# for path in runner_with_metadata.get_column_lineage():
#     print(f"Column Lineage: {path.source.column} -> {path.target.column}")

view raw JSON →