pandas-stubs
pandas-stubs provides public type annotations for the pandas library, adhering to PEP 561 for separate stub packages. These stubs enable static type checking of pandas code, focusing on recommended usage patterns and aiming for soundness over completeness. The current version, 3.0.0.260204, indicates a test against pandas 3.0.0, with stub releases occurring more frequently than pandas releases to reflect ongoing type evolution.
Warnings
- breaking The current 3.0.x releases of pandas-stubs may not fully support all new features introduced in pandas 3.0. This can lead to unexpected type errors or missing type information for newer APIs.
- gotcha pandas-stubs provides a 'narrower' set of type annotations compared to what pandas itself might allow, prioritizing recommended best practices and soundness over complete API coverage. This means some valid pandas code, especially less common patterns, might still raise type errors.
- gotcha The typing goals for `pandas` itself (internal consistency) and `pandas-stubs` (public API usage) differ. This can occasionally lead to inconsistencies or divergences in type definitions between the two projects. While a long-term goal is to merge them, they remain separate for now.
- gotcha Changes in pandas API or updates to pandas-stubs (e.g., how methods like `itertuples()` are typed) can introduce new type checker errors in existing code that previously passed without issue.
Install
-
pip install pandas-stubs
Imports
- pandas
import pandas as pd
Quickstart
import pandas as pd
def analyze_data(df: pd.DataFrame) -> pd.Series:
# This operation is correctly typed
total_sales = df['sales'].sum()
print(f"Total sales: {total_sales}")
# Example that would cause a type error with pandas-stubs
# pandas.DataFrame.round expects a Series or int for 'decimals', not another DataFrame.
# This line is commented out to allow the script to run without runtime error,
# but a type checker would flag it.
# decimals_df = pd.DataFrame({'price': 2})
# df_rounded = df.round(decimals=decimals_df)
# Correct usage for .round() method
decimals_series = pd.Series({'price': 2, 'quantity': 0})
df['price'] = df['price'].round(decimals=decimals_series.get('price', 0))
return df['price']
if __name__ == "__main__":
data = {'item': ['A', 'B', 'C'], 'sales': [100.5, 150.2, 200.0], 'price': [10.234, 5.678, 12.345], 'quantity': [10, 20, 15]}
sales_df = pd.DataFrame(data)
processed_prices = analyze_data(sales_df)
print(processed_prices)