PandasAI
PandasAI is a Python library that enhances data analysis by integrating Large Language Models (LLMs) with pandas DataFrames. It allows users to interact with their data using natural language prompts, supporting various data sources like SQL, CSV, and Excel. Currently at version 3.0.0, the library is actively developed with a frequent release cadence, often introducing alpha and beta versions before stable releases.
Common errors
-
ModuleNotFoundError: No module named 'pandasai.llm'
cause In PandasAI v3, LLM classes are no longer directly under `pandasai.llm` but are moved to separate extension packages.fixInstall the appropriate LLM extension (e.g., `pip install pandasai-litellm`) and update imports (e.g., `from pandasai_litellm.litellm import LiteLLM`). -
AttributeError: 'Agent' object has no attribute 'clarification_questions'
cause The `clarification_questions`, `rephrase_query`, and `explain` methods on the `Agent` class were removed in PandasAI v3.fixThese methods are deprecated. Use `agent.chat()` for general interaction and `agent.follow_up()` to maintain conversational context. -
UnsupportedModelError: Unsupported model: The model 'GPT-4o' doesn't exist or is not supported yet.
cause The specified LLM model name is either incorrect, misspelled, not yet supported by the PandasAI LLM wrapper (e.g., LiteLLM), or your API key does not have access to it.fixVerify the exact model name with your LLM provider and the `pandasai-litellm` documentation. Ensure your API key has the necessary permissions. Update `pandasai-litellm` to the latest version to support newer models. -
TypeError: unsupported operand type(s) for /: 'str' and 'int' (or similar type conversion errors during code execution)
cause The LLM generated Python code that attempts an operation (e.g., arithmetic) on columns with incompatible data types (e.g., trying to divide a string column by an integer). This can happen due to LLM misinterpretation of data or model drift.fixBe more explicit in your natural language prompts about expected data types (e.g., 'calculate the average of the numerical 'sales' column'). Consider pre-processing your DataFrame to ensure correct dtypes or implement explicit type casting in PandasAI 'skills' or `custom_instructions`.
Warnings
- breaking PandasAI v3 introduces significant architectural changes, particularly in how LLMs are configured and imported. LLMs are now extension-based, requiring separate installations like `pandasai-litellm`. The LLM must be configured globally using `pai.config.set()` instead of being passed directly to `SmartDataframe` or `Agent` constructors.
- breaking The `SmartDataframe` and `SmartDatalake` classes from v2 are largely superseded by a new API pattern in v3. While `SmartDataframe` may still work for single dataframes, the recommended approach is `pai.DataFrame()`. `SmartDatalake` is no longer necessary, as `pai.chat()` can query multiple dataframes directly. Also, methods like `push()` and `pull()` on DataFrames were removed in v3.0.0.
- breaking Several utility methods on the `Agent` class, such as `clarification_questions()`, `rephrase_query()`, and `explain()`, have been removed in v3.
- gotcha PandasAI currently specifies Python requirements as `<3.12,>=3.8` and may pin `pandas==1.5.3`. This can lead to installation failures or unexpected behavior if attempting to install with Python 3.12+ or if another package in your environment requires a newer `pandas` version (e.g., Pandas 3.0+ has significant breaking changes like Copy-on-Write semantics and dedicated string dtypes, which are incompatible with `pandasai`'s pinned version).
- gotcha LLM model drift (changes in underlying LLM behavior over time) can cause previously working queries to fail or produce incorrect Python code (e.g., type mismatches, inappropriate method calls) during execution by PandasAI. This is particularly noted with OpenAI models.
Install
-
pip install pandasai pandasai-litellm -
pip install 'pandasai[excel]' 'pandasai[google-ai]' 'pandasai[sql][postgres]'
Imports
- pai
from pandasai import PandasAI
import pandasai as pai
- LiteLLM
from pandasai.llm.openai import OpenAI
from pandasai_litellm.litellm import LiteLLM
Quickstart
import os
import pandas as pd
import pandasai as pai
from pandasai_litellm.litellm import LiteLLM
# Set your API key from environment variable
openai_api_key = os.environ.get('OPENAI_API_KEY', 'YOUR_OPENAI_API_KEY')
# Initialize LiteLLM with your desired model
# Ensure the model name is correct and supported by your LiteLLM setup/API key
llm = LiteLLM(model="gpt-4o-mini", api_key=openai_api_key)
# Configure PandasAI globally with the LLM
pai.config.set({"llm": llm})
# Sample DataFrame
data = {
"country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain"],
"gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360],
"happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4]
}
df = pd.DataFrame(data)
# Convert pandas DataFrame to PandasAI DataFrame
pai_df = pai.DataFrame(df)
# Chat with your data
response = pai_df.chat("Which are the top 3 countries by GDP?")
print(response)