pyjanitor

0.32.23 · active · verified Thu Apr 16

pyjanitor is a Python library that extends pandas DataFrames with a clean, user-friendly API for data cleaning and preprocessing. Inspired by the R `janitor` package, it facilitates common data wrangling tasks like cleaning column names, handling missing values, and method chaining. Currently at version 0.32.23, the library maintains an active development pace with frequent releases addressing performance, new features, and deprecations to align with evolving pandas APIs.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to install pyjanitor, import it alongside pandas, and use the `clean_names()` function to standardize column headers in a DataFrame for easier manipulation. This function automatically converts names to lowercase and replaces spaces and special characters with underscores.

import pandas as pd
import janitor

# Sample DataFrame with messy column names
data = {
    'First Name': ['Alice', 'Bob'],
    'Last-Name': ['Smith', 'Johnson'],
    'AGE (Years)': [24, 30]
}
df = pd.DataFrame(data)

print("Original DataFrame:\n", df)

# Clean column names using pyjanitor's clean_names()
cleaned_df = df.clean_names()

print("\nCleaned DataFrame:\n", cleaned_df)
print("\nCleaned column names:", cleaned_df.columns.tolist())

view raw JSON →