pandas-vet
pandas-vet is a flake8 plugin that provides opinionated linting for pandas code. It helps enforce best practices and reduce common footguns when working with pandas DataFrames and Series by flagging problematic patterns and encouraging more robust and readable code. It is actively maintained and currently at version 2023.8.2, with a regular release cadence.
Common errors
-
PD011 Use `.to_numpy()` instead of `.values`
cause This error occurs when `.values` is used on an object that is not a pandas Series or Index, such as a NumPy NamedTuple or a PyTorch tensor, leading to a false positive.fixIf the object is genuinely not a pandas object, you might need to suppress this specific check using a `# noqa: PD011` comment on the line, or configure flake8 to ignore `PD011` if it's a consistent false positive across your non-pandas codebases. However, ensure it's not actually a pandas object where `.to_numpy()` would be appropriate. -
flake8: command not found
cause The `flake8` executable is not found in your system's PATH, or `flake8` was not installed in your active Python environment.fixEnsure `flake8` is installed in your environment (`pip install flake8`) and that your environment's `Scripts` or `bin` directory is in your system's PATH. If using a virtual environment, ensure it is activated. -
ModuleNotFoundError: No module named 'pandas_vet'
cause This error usually does not occur directly as `pandas-vet` is a flake8 plugin and not typically imported directly. However, if you are attempting to import it, or if flake8 cannot find the installed plugin, this could appear.fixEnsure `pandas-vet` is correctly installed in the same Python environment where `flake8` is running (`pip install pandas-vet`). Verify that `flake8` can list the plugin by running `flake8 --version` (it should show `pandas-vet` listed). Restart your IDE/editor if using an integrated linter.
Warnings
- breaking Using `inplace=True` is strongly discouraged by pandas-vet (PD002) and increasingly by the pandas core team. It can lead to inconsistent behavior, prevent method chaining, and doesn't always provide performance benefits.
- deprecated Accessing the underlying NumPy array using the `.values` attribute (PD011) is ambiguous and deprecated in pandas.
- gotcha pandas-vet enforces opinionated import styles and variable names. For example, not importing pandas as `import pandas as pd` (PD001) or naming a DataFrame `df` (PD901) will trigger warnings.
- deprecated Older methods like `.isnull` (PD003), `.notnull` (PD004), `.ix` (PD007), `.pivot` or `.unstack` (PD010), `.read_table` (PD012), and `.stack` (PD013) are flagged by pandas-vet in favor of their more explicit or recommended counterparts.
Install
-
pip install pandas-vet -
conda install -c conda-forge pandas-vet
Imports
- pandas-vet as a flake8 plugin
No direct import needed in user code; flake8 automatically discovers installed plugins.
Quickstart
# my_pandas_script.py
import pandas
df = pandas.DataFrame({
'col_a': [i for i in range(20)],
'col_b': [j for j in range(20, 40)]
})
df.drop(columns='col_b', inplace=True)
# Run flake8 from your terminal in the same directory
# flake8 my_pandas_script.py
# Expected output (may vary slightly based on flake8 version):
# my_pandas_script.py:2:1: PD001 pandas should always be imported as 'import pandas as pd'
# my_pandas_script.py:4:1: PD901 'df' is a bad variable name. Be kinder to your future self.
# my_pandas_script.py:7:1: PD002 'inplace = True' should be avoided; it has inconsistent behavior.