Pandasql

0.7.3 · maintenance · verified Sat Apr 11

Pandasql is a Python library that allows users to query pandas DataFrames using SQL syntax. It functions similarly to `sqldf` in R, leveraging SQLite under the hood to provide a familiar interface for data manipulation and analysis for those comfortable with SQL. It is currently at version 0.7.3 and receives limited updates, with alternatives like DuckDB or Polars SQL often recommended for more active development or performance needs.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to import `sqldf`, create a pandas DataFrame, define a SQL query as a string referencing the DataFrame by its variable name, and then execute the query to get a new DataFrame as output.

import pandas as pd
from pandasql import sqldf

# Create a sample pandas DataFrame
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 22],
    'city': ['New York', 'Los Angeles', 'Chicago']
})

# Define an SQL query as a string
query = """
SELECT name, age
FROM df
WHERE age > 23
ORDER BY age DESC
"""

# Execute the SQL query using sqldf
result_df = sqldf(query)

print(result_df)

view raw JSON →