Collate SQLFluff
collate-sqlfluff is a fork of SQLFluff (the SQL Linter for Humans), maintained by the OpenMetadata community. It provides a modular, dialect-flexible, and configurable SQL linter and auto-formatter, designed particularly for ELT applications. It supports multiple SQL dialects (e.g., BigQuery, Snowflake, PostgreSQL) and templating languages like Jinja and dbt. The library frequently syncs with upstream SQLFluff, incorporating its features and adhering to its semantic versioning, while adding specific enhancements relevant to OpenMetadata's ecosystem.
Warnings
- breaking collate-sqlfluff closely follows upstream sqlfluff releases. Major version updates in sqlfluff (e.g., 2.x to 3.x, 3.x to 4.x) introduce breaking changes to the Python API, rule coding, configuration, and CLI behavior. For instance, `sqlfluff fix` defaults behavior changed in 3.x, and 4.x introduced optional Rust routines. Users should consult the upstream sqlfluff release notes for migration guides.
- gotcha Although the package name is `collate-sqlfluff`, the primary Python import statement to access its functionality is `import sqlfluff`. Attempting to `import collate_sqlfluff` will result in a `ModuleNotFoundError`.
- gotcha collate-sqlfluff uses a hierarchical configuration system where local configuration files override global ones. However, the `templater` configuration option *cannot* be set in config files located in subdirectories of the working directory; it must be set at a higher level.
- gotcha When using Jinja or dbt templating with `collate-sqlfluff`, macros within the SQL can potentially execute arbitrary code. While `sqlfluff` employs Jinja2's `SandboxedEnvironment` for some protection, users with edit access to SQL or configuration files should be aware of potential security implications, as some macros (e.g., dbt `run_query`) might execute arbitrary SQL.
Install
-
pip install collate-sqlfluff
Imports
- sqlfluff
import sqlfluff
- Linter
from sqlfluff.core import Linter, FluffConfig
Quickstart
import sqlfluff
my_bad_query = "SeLEct *, 1, blah as fOO from mySchema.myTable"
# Lint the given string and return violations
lint_result = sqlfluff.lint(my_bad_query, dialect="bigquery")
print("Linting Results:", lint_result)
# Fix the given string and get a fixed string back
fix_result = sqlfluff.fix(my_bad_query, dialect="bigquery")
print("Fixed Query:\n", fix_result)