SQL Lineage Analysis Tool
SQLLineage is a Python library designed for SQL lineage analysis, capable of identifying source and target tables, as well as providing column-level lineage from SQL queries. It leverages popular SQL parser libraries like sqlfluff and sqlparse, and uses networkx for graph representation. The library is actively maintained, with its current version being 1.5.7, and sees regular minor releases to introduce enhancements and bug fixes.
Warnings
- breaking Starting with v1.5.x, `ansi` is the default SQL dialect. This might cause parsing issues for non-ANSI compliant SQL that previously worked without explicit dialect specification. Non-validating dialects are targeted for deprecation in v1.6.
- gotcha Column-level lineage can be inaccurate or incomplete without providing table metadata. For queries involving `SELECT *` or unqualified column names, `sqllineage` cannot fully resolve column dependencies without schema information.
- breaking Python 3.9 support was dropped, and Python 3.14 support was added in `v1.5.7`. Python 3.8 was deprecated in `v1.5.4`. Ensure your environment uses a compatible Python version.
- gotcha Complex SQL patterns, deeply nested subqueries, large UNION chains, dynamic table/column name resolution, or vendor-specific functions can challenge generic SQL parsers, including `sqllineage`, potentially leading to incomplete or incorrect lineage.
Install
-
pip install sqllineage
Imports
- LineageRunner
from sqllineage.runner import LineageRunner
Quickstart
from sqllineage.runner import LineageRunner
sql = "INSERT INTO target_schema.target_table SELECT col1, col2 FROM source_schema.source_table WHERE col3 > 100"
runner = LineageRunner(sql)
print(f"Source Tables: {[str(t) for t in runner.source_tables]}")
print(f"Target Tables: {[str(t) for t in runner.target_tables]}")
# For column-level lineage (requires metadata for full accuracy)
# from sqllineage.core.metadata_provider import DummyMetaDataProvider
# metadata = {
# 'source_schema.source_table': ['col1', 'col2', 'col3']
# }
# runner_with_metadata = LineageRunner(sql, metadata_provider=DummyMetaDataProvider(metadata))
# for path in runner_with_metadata.get_column_lineage():
# print(f"Column Lineage: {path.source.column} -> {path.target.column}")