{"id":3206,"library":"pandasql","title":"Pandasql","description":"Pandasql is a Python library that allows users to query pandas DataFrames using SQL syntax. It functions similarly to `sqldf` in R, leveraging SQLite under the hood to provide a familiar interface for data manipulation and analysis for those comfortable with SQL. It is currently at version 0.7.3 and receives limited updates, with alternatives like DuckDB or Polars SQL often recommended for more active development or performance needs.","status":"maintenance","version":"0.7.3","language":"en","source_language":"en","source_url":"https://github.com/yhat/pandasql/","tags":["pandas","sql","dataframe","sqlite","data analysis","query"],"install":[{"cmd":"pip install pandasql","lang":"bash","label":"Install with pip"}],"dependencies":[{"reason":"Core functionality relies on pandas DataFrames.","package":"pandas"}],"imports":[{"note":"This is the primary function used to execute SQL queries on DataFrames.","symbol":"sqldf","correct":"from pandasql import sqldf"}],"quickstart":{"code":"import pandas as pd\nfrom pandasql import sqldf\n\n# Create a sample pandas DataFrame\ndf = pd.DataFrame({\n    'name': ['Alice', 'Bob', 'Charlie'],\n    'age': [25, 30, 22],\n    'city': ['New York', 'Los Angeles', 'Chicago']\n})\n\n# Define an SQL query as a string\nquery = \"\"\"\nSELECT name, age\nFROM df\nWHERE age > 23\nORDER BY age DESC\n\"\"\"\n\n# Execute the SQL query using sqldf\nresult_df = sqldf(query)\n\nprint(result_df)","lang":"python","description":"This quickstart demonstrates how to import `sqldf`, create a pandas DataFrame, define a SQL query as a string referencing the DataFrame by its variable name, and then execute the query to get a new DataFrame as output."},"warnings":[{"fix":"Reduce the number of columns in your query, split complex queries into smaller parts, or consider processing data in chunks. For very large datasets or complex analytical needs, consider using dedicated database engines or more performant alternatives like DuckDB or Polars SQL.","message":"Pandasql uses SQLite as its backend, which has inherent limitations, such as a default limit of around 999 SQL variables/parameters. This can lead to `OperationalError: too many SQL variables` when querying very wide DataFrames (many columns) or complex queries with numerous parameters.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure DataFrames are defined in the same scope where `sqldf` is called, or explicitly pass `globals()` or `locals()` as the second argument to `sqldf`, e.g., `sqldf(query, globals())` or `sqldf(query, locals())`.","message":"For `sqldf` to correctly identify DataFrames, they must be available in the global or local scope where `sqldf` is called. If `sqldf` is used within a function and the DataFrame is not passed into the function's `locals()` or `globals()`, you might encounter `no such table` errors.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Perform data modification (updates, inserts, deletes) or schema changes directly using pandas DataFrame methods.","message":"Pandasql only supports Data Query Language (DQL) operations (e.g., SELECT statements). It does not support Data Manipulation Language (DML) like INSERT, UPDATE, DELETE, or Data Definition Language (DDL) like CREATE, ALTER, DROP for modifying DataFrames or their structure.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Evaluate and consider migrating to more actively maintained libraries such as DuckDB or Polars if performance, new features, or long-term support are critical.","message":"The `pandasql` library receives limited updates and is less actively maintained compared to other SQL-on-DataFrame solutions. For new projects or performance-critical applications, modern alternatives like `duckdb` (via `duckdb.query_df`), Polars SQL, or even native `pandas.DataFrame.query` are often recommended for better performance and ongoing development.","severity":"deprecated","affected_versions":"All versions (future-looking)"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}