Cuallee
raw JSON → 0.15.4 verified Fri May 01 auth: no python
Cuallee is a Python library for data validation on DataFrame APIs including Snowflake/Snowpark, Apache PySpark, and Pandas. It provides declarative check rules to validate data quality. Current version is 0.15.4, requiring Python >=3.10. Active development with frequent releases.
pip install cuallee Common errors
error AttributeError: 'DuckDbPyRelation' object has no attribute 'columns' ↓
cause Passing a DuckDB relation object directly without registering it first.
fix
Register the relation with DuckDB before passing:
duckdb.register('temp', relation) then use the registered name or a connection. error ModuleNotFoundError: No module named 'cuallee' ↓
cause Forgetting to install the library.
fix
Run
pip install cuallee. error TypeError: Check.validate() got an unexpected keyword argument 'verbose' ↓
cause Using old API that expected 'verbose' parameter; removed in newer versions.
fix
Remove 'verbose' from validate() call; use
print(results) instead. Warnings
gotcha Check.ok() method requires recomputing validation; prefer storing results if you need both results and pass/fail. ↓
fix Store validation results with `result = check.validate(df)`, then use `result.status[result.status == 'PASS'].all()` or similar.
gotcha DuckDB relation objects must be registered before passing to cuallee, otherwise error: 'DuckDbPyRelation' object has no attribute 'columns'. ↓
fix Use `duckdb.register('my_view', relation)` and then pass the registered name or the connection's table.
breaking Python 3.9 support dropped, requires Python >=3.10 as of version 0.14.1. ↓
fix Upgrade Python to 3.10 or higher.
Imports
- Check
from cuallee import Check - CheckLevel
from cuallee import CheckLevel
Quickstart
import pandas as pd
from cuallee import Check, CheckLevel
# Create a sample DataFrame
df = pd.DataFrame({'id': [1, 2, 3], 'value': [10, 20, 30]})
# Define check rules
check = Check(CheckLevel.WARNING, 'my_check')
check.is_complete('id')
check.is_unique('id')
check.is_greater_than('value', 5)
# Validate and get results
results = check.validate(df)
print(results)
# Use ok() to check if all passed
if check.ok(df):
print("All checks passed!")
else:
print("Some checks failed.")