Simple DDL Parser
Simple DDL Parser is a Python library designed to parse SQL DDL (Data Definition Language) files from various dialects, including HQL, TSQL (MSSQL), Oracle, AWS Redshift, Snowflake, MySQL, and PostgreSQL. It extracts comprehensive information about database entities like tables, columns (types, defaults, primary keys), sequences, alters, and custom types into a JSON or Python dictionary format. The library is actively maintained with frequent releases, currently at version 1.13.0, and typically sees rapid updates with more than 12 releases per year.
Common errors
-
DDLParserError: Unknown statement at LexToken(STRING,"'(value)'",line,column)
cause The parser failed to understand a `CHECK` constraint using the `IN` keyword, e.g., `CHECK(col IN ('val1', 'val2'))`.fixEnsure you are using a recent version of `simple-ddl-parser` (fix was included in `v0.26.0` and later). If the issue persists with other `IN` clauses, consider rewriting the `CHECK` statement using `OR` conditions or report an issue. -
Table is completely missing from the parsed output, or columns are incorrect, despite valid DDL.
cause The parser might encounter issues with complex index definitions (e.g., multi-column indexes with many included columns) within a `CREATE TABLE` statement, causing it to skip the entire table or misparse parts of it.fixSimplify the problematic index definitions in your DDL for parsing, or update to the latest version as bug fixes are frequently released for such edge cases. Report specific DDL examples as GitHub issues if the problem persists. -
TypeError: 'NoneType' object is not subscriptable
cause This generic error often occurs when the parser fails to correctly interpret a part of the DDL, leading to `None` being returned where a dictionary or list was expected, and subsequent code attempts to access it.fixEnable strict error reporting with `DDLParser(ddl_string, silent=False)` to get a more specific `DDLParserError`. Review the DDL for syntax that might not be fully supported by `simple-ddl-parser` or dialect-specific nuances.
Warnings
- breaking Python 3.7 and 3.8 are no longer supported. The minimum required Python version is now 3.9.
- breaking The output structure for arguments within brackets (e.g., `NULL_IF=('field')`) changed from a single string to a list of strings. Additionally, `=` and `IN` keywords are now parsed as separate tokens, which might affect custom processing of the output.
- breaking For PostgreSQL, `TIME ZONE` is now an explicit boolean keyword `with_time_zone` instead of being part of the type definition. For BigQuery, `RANGE_BUCKETS` range is now in its own `range` keyword, and `*_TRUNC` functions' second argument moved to `trunc_by`.
- gotcha By default, the parser does not raise an error if a statement cannot be parsed; instead, it silently skips the problematic statement and may produce incomplete or empty output. This can make debugging difficult.
Install
-
pip install simple-ddl-parser
Imports
- DDLParser
from simple_ddl_parser import DDLParser
- parse_from_file
from simple_ddl_parser import parse_from_file
Quickstart
from simple_ddl_parser import DDLParser
ddl_statement = """
CREATE TABLE users (
user_id INT PRIMARY KEY,
username VARCHAR(50) NOT NULL UNIQUE,
email VARCHAR(100),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
"""
parser = DDLParser(ddl_statement)
parsed_data = parser.run()
print(parsed_data)
# Example with dialect-specific output (e.g., HQL)
hql_ddl = """
CREATE EXTERNAL TABLE IF NOT EXISTS database.table_name (
day_long_nm string,
calendar_dt date,
source_batch_id string
)
PARTITIONED BY (batch_id int)
STORED AS PARQUET
LOCATION 's3://datalake/table_name/v1';
"""
hql_parser = DDLParser(hql_ddl)
hql_parsed_data = hql_parser.run(output_mode='hql')
print(hql_parsed_data)