Woodwork Data Typing Library

0.31.0 · active · verified Fri Apr 17

Woodwork is a data typing library for machine learning, extending pandas DataFrames and Series with semantic and logical typing capabilities. It enables automatic data typing inference, validation, and schema management for robust data pipelines. Currently at version 0.31.0, it is actively maintained by Alteryx with frequent updates.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize Woodwork on a pandas DataFrame, allowing it to automatically infer logical types for your columns. It then prints the inferred schema, logical types, and the specific logical type for the 'email' column.

import pandas as pd
import woodwork as ww

data = {
    "id": [1, 2, 3],
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35],
    "email": ["alice@example.com", "bob@example.com", "charlie@example.com"]
}
df = pd.DataFrame(data)

# Initialize Woodwork on the DataFrame to infer types and create a schema
df.ww.init()

print(df.ww.schema)
print(df.ww.logical_types)
print(df.ww['email'].ww.logical_type)

view raw JSON →