{"id":7574,"library":"pyjanitor","title":"pyjanitor","description":"pyjanitor is a Python library that extends pandas DataFrames with a clean, user-friendly API for data cleaning and preprocessing. Inspired by the R `janitor` package, it facilitates common data wrangling tasks like cleaning column names, handling missing values, and method chaining. Currently at version 0.32.23, the library maintains an active development pace with frequent releases addressing performance, new features, and deprecations to align with evolving pandas APIs.","status":"active","version":"0.32.23","language":"en","source_language":"en","source_url":"https://github.com/pyjanitor-devs/pyjanitor","tags":["pandas","data cleaning","data science"],"install":[{"cmd":"pip install pyjanitor","lang":"bash","label":"PyPI"}],"dependencies":[{"reason":"Core dependency; pyjanitor extends pandas DataFrames.","package":"pandas"}],"imports":[{"note":"Imports pyjanitor's functionality, registering its methods as pandas DataFrame accessors/methods.","symbol":"janitor","correct":"import janitor"},{"note":"pandas is a prerequisite for pyjanitor's functionality.","symbol":"pandas","correct":"import pandas as pd"}],"quickstart":{"code":"import pandas as pd\nimport janitor\n\n# Sample DataFrame with messy column names\ndata = {\n    'First Name': ['Alice', 'Bob'],\n    'Last-Name': ['Smith', 'Johnson'],\n    'AGE (Years)': [24, 30]\n}\ndf = pd.DataFrame(data)\n\nprint(\"Original DataFrame:\\n\", df)\n\n# Clean column names using pyjanitor's clean_names()\ncleaned_df = df.clean_names()\n\nprint(\"\\nCleaned DataFrame:\\n\", cleaned_df)\nprint(\"\\nCleaned column names:\", cleaned_df.columns.tolist())","lang":"python","description":"This quickstart demonstrates how to install pyjanitor, import it alongside pandas, and use the `clean_names()` function to standardize column headers in a DataFrame for easier manipulation. This function automatically converts names to lowercase and replaces spaces and special characters with underscores."},"warnings":[{"fix":"Use `pd.DataFrame.assign` for general column additions/modifications. For groupby operations, utilize the `assign` method directly on the groupby object: `df.groupby('col').assign(...)` which was introduced in v0.32.18.","message":"The `mutate` DataFrame method has been deprecated. Users are advised to transition to alternative approaches for adding or modifying columns.","severity":"deprecated","affected_versions":">=0.32.17"},{"fix":"Ensure that `groupby` operations access methods directly on the `DataFrameGroupBy` object returned by `.groupby()`. Review documentation examples for updated patterns.","message":"Direct usage of 'by' methods for groupby operations on DataFrames has been migrated to be directly available on groupby objects for improved API consistency.","severity":"breaking","affected_versions":">=0.32.20"},{"fix":"Prefer native pandas functions such as `pd.DataFrame.assign`, `pd.DataFrame.drop`, `pd.DataFrame.rename`, and `pd.DataFrame.query` where possible.","message":"Functions like `add_column`, `add_columns`, `remove_columns`, `rename_column`, `rename_columns`, and `filter_on` are slated for deprecation in a future 1.x release, as their functionality largely overlaps with native pandas methods.","severity":"deprecated","affected_versions":"Future 1.x releases (warnings may appear in 0.x)."}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Ensure pyjanitor is installed in your active environment: `pip install pyjanitor`. If using virtual environments, activate the correct environment before running your code.","cause":"The pyjanitor package is not installed in the current Python environment or the environment in use is not the one where pyjanitor was installed.","error":"ModuleNotFoundError: No module named 'pyjanitor'"},{"fix":"Add `import janitor` to your script after `import pandas as pd`. This registers pyjanitor's functions as DataFrame methods.","cause":"The `janitor` module was not imported, which means its DataFrame accessor methods have not been registered with pandas.","error":"AttributeError: 'DataFrame' object has no attribute 'clean_names'"},{"fix":"For operations on grouped DataFrames, use `df.groupby(...).assign(...)` instead of `mutate`. Refer to pyjanitor's documentation for the correct methods available on GroupBy objects for your version.","cause":"Attempting to use a deprecated pyjanitor method on a pandas GroupBy object, or before the relevant methods were added to GroupBy objects.","error":"TypeError: 'DataFrameGroupBy' object has no attribute 'mutate' (or similar for other deprecated methods on groupby objects)"}]}