qpd: Query Pandas Using SQL

0.4.4 · active · verified Wed Apr 15

QPD (Query Pandas Dataframes) is a Python library that allows users to run SQL `SELECT` statements on pandas-like dataframes, including Pandas, Dask, and Ray (via Modin on Ray). It translates SQL directly into dataframe operations, prioritizing correctness and consistent behavior across backends, even handling SQL-specific behaviors like `GROUP BY` with null keys differently than default pandas. The current version is 0.4.4. The library has a sporadic but active release cadence, with multiple minor updates in recent history, primarily focusing on compatibility and bug fixes.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use `qpd` to execute a SQL `SELECT` query on a Pandas DataFrame. It creates a sample DataFrame, defines a SQL query with `WHERE`, `GROUP BY`, `HAVING`, and `ORDER BY` clauses, and then uses `run_sql_on_pandas` to get the resulting DataFrame.

import pandas as pd
from qpd_pandas import run_sql_on_pandas

# Create a sample Pandas DataFrame
data = {
    'id': [1, 2, 3, 4, 5, 6, 7, 8],
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace', 'Heidi'],
    'age': [25, 30, 35, 25, 40, 30, 35, 25],
    'city': ['New York', 'London', 'Paris', 'London', 'New York', 'Paris', 'London', 'New York']
}
df = pd.DataFrame(data)

# Define an SQL query
sql_query = """
    SELECT city, AVG(age) AS avg_age, COUNT(id) AS num_people
    FROM df
    WHERE age > 25
    GROUP BY city
    HAVING COUNT(id) > 1
    ORDER BY avg_age DESC
"""

# Run the SQL query on the DataFrame
result_df = run_sql_on_pandas(sql_query, df=df)

print(result_df)

view raw JSON →