pandas-schema

raw JSON →
0.3.6 verified Fri May 01 auth: no python maintenance

A validation library for Pandas data frames using user-friendly schemas. Current version is 0.3.6, with infrequent releases.

pip install pandas-schema
error ImportError: cannot import name 'Check' from 'pandas_schema'
cause Check class is in the validation submodule, not at package level.
fix
Use: from pandas_schema.validation import Check
error AttributeError: 'DataFrame' object has no attribute 'validate'
cause Confusion with pandas built-in validate; the package provides schema.validate(df).
fix
Instantiate DataFrameSchema and call schema.validate(df).
gotcha Column type parameter expects Python types (e.g., int, float) not numpy dtypes. Using 'int64' may cause unexpected behavior.
fix Use int, float, str, etc.
gotcha Validators are in pandas_schema.validation; importing from pandas_schema directly won't give Check or validation classes.
fix Use from pandas_schema.validation import Check, CanConvertValidation, etc.
gotcha InRangeValidation crashes on non-numeric text in versions <=0.3.5. This is fixed in 0.3.6.
fix Upgrade to 0.3.6 or ensure data is numeric before validation.

Define a schema and validate a DataFrame. Returns list of errors.

import pandas as pd
from pandas_schema import DataFrameSchema, Column
from pandas_schema.validation import CanConvertValidation

schema = DataFrameSchema({
    'age': Column(int, [
        CanConvertValidation(int)
    ])
})
df = pd.DataFrame({'age': ['25', '30']})
errors = schema.validate(df)
if errors:
    for error in errors:
        print(error)
else:
    print('Validation passed')