OpenLineage SQL

1.46.0 · active · verified Thu Apr 09

The `openlineage-sql` library provides a Python interface to a high-performance Rust library for SQL lineage extraction. It enables parsing SQL queries to identify input tables, output tables, and query types, crucial for data governance and ETL pipeline observability. Currently at version 1.46.0, it is actively maintained with frequent updates reflecting improvements in its core Rust library and broader OpenLineage specification, requiring Python 3.10 or newer.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use the `parse` function to extract lineage information (inputs, outputs, and query type) from a given SQL query, including DDL statements.

from openlineage.sql import parse

sql_query = "SELECT a, b FROM input_table JOIN other_table ON input_table.id = other_table.id WHERE a > 10"
result = parse(sql_query)

print(f"SQL Query Type: {result.query_type}")
print(f"Input Tables: {result.inputs}")
print(f"Output Tables: {result.outputs}")

# Example with DDL
sql_ddl = "CREATE TABLE new_table (id INT, name VARCHAR(255))"
ddl_result = parse(sql_ddl)
print(f"\nDDL Query Type: {ddl_result.query_type}")
print(f"DDL Output Tables: {ddl_result.outputs}")

view raw JSON →