{"id":4480,"library":"collate-sqllineage","title":"Collate SQL Lineage","description":"Collate SQL Lineage is a Python tool designed to analyze SQL statements and extract their data lineage, mapping relationships between tables and columns. It builds upon `sqllineage` for core lineage parsing and `sqlfluff` for robust SQL parsing and linting. The library provides a programmatic interface for integrating lineage analysis into data governance, compliance, and impact analysis workflows. It is currently at version 2.1.0 and maintains an active development and release cadence.","status":"active","version":"2.1.0","language":"en","source_language":"en","source_url":"https://github.com/chenjianqu/collate-sqllineage","tags":["SQL","data lineage","data governance","SQL parsing","analysis"],"install":[{"cmd":"pip install collate-sqllineage","lang":"bash","label":"Install stable version"}],"dependencies":[],"imports":[{"note":"As of v2.0.0, direct function calls like `run_lineage` were removed in favor of the `CollateSQLLineageRunner` class.","wrong":"from collate_sqllineage.runner import run_lineage","symbol":"CollateSQLLineageRunner","correct":"from collate_sqllineage.runner import CollateSQLLineageRunner"}],"quickstart":{"code":"from collate_sqllineage.runner import CollateSQLLineageRunner\n\n# Example SQL statement to analyze\nsql_statement = \"\"\"\nINSERT INTO target_schema.target_table (col_a, col_b)\nSELECT\n    source_schema.source_table_1.col_x,\n    source_schema.source_table_2.col_y\nFROM\n    source_schema.source_table_1\nJOIN\n    source_schema.source_table_2 ON source_schema.source_table_1.id = source_schema.source_table_2.id\nWHERE\n    source_schema.source_table_1.status = 'active';\n\"\"\"\n\n# Initialize the runner. Configuration can be passed here (e.g., for verbose logging).\n# runner = CollateSQLLineageRunner(config={'verbose': True})\nrunner = CollateSQLLineageRunner()\n\n# Run the lineage analysis\nlineage_result = runner.run(sql=sql_statement)\n\n# Print the extracted lineage information\nprint(\"--- Extracted SQL Lineage ---\")\nprint(f\"Source Tables: {lineage_result.get('tables', {}).get('source', [])}\")\nprint(f\"Target Tables: {lineage_result.get('tables', {}).get('target', [])}\")\n\n# The 'lineage_result' dictionary contains more detailed information including columns, statements, etc.\n# print(lineage_result)","lang":"python","description":"This quickstart demonstrates how to initialize `CollateSQLLineageRunner` and use its `run()` method to extract data lineage from a sample SQL `INSERT` statement. The output includes source and target tables, with the full result containing more granular details like column-level lineage."},"warnings":[{"fix":"Migrate existing code to use `CollateSQLLineageRunner` by instantiating the class and calling its `run()` method with the SQL string. For example, `runner = CollateSQLLineageRunner(); result = runner.run(sql=my_sql)`. Old direct function calls will raise an `AttributeError` or `ImportError`.","message":"Version 2.0.0 introduced a significant API refactor, deprecating direct function calls for lineage analysis in favor of the `CollateSQLLineageRunner` class. Code written for versions prior to 2.0.0 will no longer work.","severity":"breaking","affected_versions":"<2.0.0 to >=2.0.0"},{"fix":"Test with representative SQL samples. For problematic queries, simplify them where possible or report issues to `collate-sqllineage` or its underlying dependencies. Enable verbose logging in `CollateSQLLineageRunner(config={'verbose': True})` to get more diagnostic information on parsing failures.","message":"Lineage extraction relies on `sqllineage` and `sqlfluff`, which may have limitations in fully parsing highly complex, non-standard, or niche SQL dialects. Edge cases or unsupported syntax might lead to incomplete or incorrect lineage results.","severity":"gotcha","affected_versions":"All versions"},{"fix":"For production environments, pin `collate-sqllineage` and its direct dependencies to specific versions. Regularly test against new dependency versions in development before deploying to ensure consistent lineage results. Review `collate-sqllineage`'s release notes for compatibility updates.","message":"While `collate-sqllineage` specifies dependency versions (`sqllineage`, `sqlfluff`), future updates to these underlying libraries might introduce subtle breaking changes in parsing behavior or output format that could impact lineage results, even if `collate-sqllineage`'s direct API remains stable.","severity":"gotcha","affected_versions":"All versions (future dependency updates)"}],"env_vars":null,"last_verified":"2026-04-12T00:00:00.000Z","next_check":"2026-07-11T00:00:00.000Z"}