Cuallee

raw JSON →
0.15.4 verified Fri May 01 auth: no python

Cuallee is a Python library for data validation on DataFrame APIs including Snowflake/Snowpark, Apache PySpark, and Pandas. It provides declarative check rules to validate data quality. Current version is 0.15.4, requiring Python >=3.10. Active development with frequent releases.

pip install cuallee
error AttributeError: 'DuckDbPyRelation' object has no attribute 'columns'
cause Passing a DuckDB relation object directly without registering it first.
fix
Register the relation with DuckDB before passing: duckdb.register('temp', relation) then use the registered name or a connection.
error ModuleNotFoundError: No module named 'cuallee'
cause Forgetting to install the library.
fix
Run pip install cuallee.
error TypeError: Check.validate() got an unexpected keyword argument 'verbose'
cause Using old API that expected 'verbose' parameter; removed in newer versions.
fix
Remove 'verbose' from validate() call; use print(results) instead.
gotcha Check.ok() method requires recomputing validation; prefer storing results if you need both results and pass/fail.
fix Store validation results with `result = check.validate(df)`, then use `result.status[result.status == 'PASS'].all()` or similar.
gotcha DuckDB relation objects must be registered before passing to cuallee, otherwise error: 'DuckDbPyRelation' object has no attribute 'columns'.
fix Use `duckdb.register('my_view', relation)` and then pass the registered name or the connection's table.
breaking Python 3.9 support dropped, requires Python >=3.10 as of version 0.14.1.
fix Upgrade Python to 3.10 or higher.

Basic usage: create a Check, add rules, validate a DataFrame, and check pass/fail.

import pandas as pd
from cuallee import Check, CheckLevel

# Create a sample DataFrame
df = pd.DataFrame({'id': [1, 2, 3], 'value': [10, 20, 30]})

# Define check rules
check = Check(CheckLevel.WARNING, 'my_check')
check.is_complete('id')
check.is_unique('id')
check.is_greater_than('value', 5)

# Validate and get results
results = check.validate(df)
print(results)

# Use ok() to check if all passed
if check.ok(df):
    print("All checks passed!")
else:
    print("Some checks failed.")