SQLFrame

raw JSON →
4.1.0 verified Fri May 01 auth: no python

SQLFrame is a Python library that translates PySpark DataFrame API calls into SQL queries for multiple database engines (BigQuery, DuckDB, Postgres, Snowflake, Spark, etc.). Version 4.1.0 requires Python >=3.10 and uses sqlglot for SQL generation. Release cadence is approximately bi-weekly.

pip install sqlframe
error ModuleNotFoundError: No module named 'sqlframe'
cause sqlframe package not installed.
fix
pip install sqlframe[engine] where engine is duckdb, bigquery, snowflake, postgres, or spark.
error AttributeError: module 'sqlframe' has no attribute 'Session'
cause Outdated version of sqlframe (<4.0.0) where Session was not yet introduced.
fix
Upgrade sqlframe: pip install --upgrade sqlframe
error pyspark.sql.utils.AnalysisException: Table or view not found: ...
cause Trying to run a SQL statement referencing a table that does not exist in the target database.
fix
Ensure you have registered the table using session.registerDataFrame(df, "table_name") or using createOrReplaceTempView on the DataFrame.
error Exception: Engine 'bigquery' not supported. Supported engines: duckdb, spark, snowflake, postgres, bigquery
cause Misspelled engine name in config or missing optional dependency for that engine.
fix
Check the engine name is one of: duckdb, spark, snowflake, postgres, bigquery. Install the corresponding extra: pip install sqlframe[bigquery]
error AttributeError: 'DataFrame' object has no attribute 'show'
cause Using an old version that does not have .show() method (added in 3.0+).
fix
Upgrade sqlframe to >=3.0.0.
breaking In version 4.0.0, the engine configuration changed. Previously you might have set an environment variable or used a different builder pattern. Now use Session.builder.config("extension", "engine_name").getOrCreate(). The old pattern with spark = SQLFrame(engine='duckdb') is removed.
fix Upgrade to 4.0.0+ and use Session.builder.config("extension", "duckdb") (or other engine).
gotcha SQLFrame does not execute queries by default. Use .show() or .collect() to actually run the query against the engine. Calling .sql() only returns the generated SQL string.
fix Use .show() to preview results, .collect() to get a list of Row objects, or .toPandas() to get a Pandas DataFrame.
deprecated The old import path from sqlframe.sql import DataFrame is deprecated as of version 4.0.0. Use from sqlframe import DataFrame instead.
fix Change imports to from sqlframe import DataFrame, Session, functions as F.
gotcha When using multiple engines in the same project, you must create a separate Session for each engine. Sharing sessions across different engine types will lead to errors.
fix Create one session per engine: session_duckdb = Session.builder.config("extension", "duckdb").getOrCreate() and session_bq = Session.builder.config("extension", "bigquery").getOrCreate().
pip install sqlframe[bigquery]
pip install sqlframe[duckdb]
pip install sqlframe[snowflake]
pip install sqlframe[postgres]
pip install sqlframe[spark]

Quickstart using DuckDB engine (no external database needed). Set up a session, create a DataFrame, apply filters, and inspect generated SQL.

import os
from sqlframe import Session

# Create a session for DuckDB (no external DB needed)
engine = "duckdb"
os.environ["SQLFRAME_ENGINE"] = engine  # optional
session = Session.builder.config("extension", engine).getOrCreate()

# Create a DataFrame from a list of tuples
df = session.createDataFrame([(1, "Alice"), (2, "Bob")], schema=["id", "name"])
df.show()

# Apply transformations
df_filtered = df.filter(df.name == "Alice").select(df.id)
print(df_filtered.sql())  # print generated SQL