OpenLineage SQL
The `openlineage-sql` library provides a Python interface to a high-performance Rust library for SQL lineage extraction. It enables parsing SQL queries to identify input tables, output tables, and query types, crucial for data governance and ETL pipeline observability. Currently at version 1.46.0, it is actively maintained with frequent updates reflecting improvements in its core Rust library and broader OpenLineage specification, requiring Python 3.10 or newer.
Warnings
- gotcha Users on less common operating systems or architectures might encounter compilation errors during installation if pre-built wheels are not available. This requires a Rust toolchain to be installed on the system.
- gotcha While supporting many SQL dialects, complex, non-standard, or highly dynamic SQL queries (e.g., involving complex macros, stored procedures, or unusual syntax) might lead to incomplete or inaccurate lineage extraction.
- gotcha The `parse` function returns a `SqlMeta` object, not directly a list of tables. Users must access specific attributes like `.inputs`, `.outputs`, and `.query_type` to retrieve the desired lineage details.
Install
-
pip install openlineage-sql
Imports
- parse
from openlineage.sql import parse
Quickstart
from openlineage.sql import parse
sql_query = "SELECT a, b FROM input_table JOIN other_table ON input_table.id = other_table.id WHERE a > 10"
result = parse(sql_query)
print(f"SQL Query Type: {result.query_type}")
print(f"Input Tables: {result.inputs}")
print(f"Output Tables: {result.outputs}")
# Example with DDL
sql_ddl = "CREATE TABLE new_table (id INT, name VARCHAR(255))"
ddl_result = parse(sql_ddl)
print(f"\nDDL Query Type: {ddl_result.query_type}")
print(f"DDL Output Tables: {ddl_result.outputs}")