Woodwork Data Typing Library
Woodwork is a data typing library for machine learning, extending pandas DataFrames and Series with semantic and logical typing capabilities. It enables automatic data typing inference, validation, and schema management for robust data pipelines. Currently at version 0.31.0, it is actively maintained by Alteryx with frequent updates.
Common errors
-
AttributeError: 'DataFrame' object has no attribute 'ww'
cause The `woodwork` library needs to be imported to register its DataFrame accessor, or `df.ww.init()` has not been called.fixEnsure `import woodwork as ww` is at the top of your script. If the DataFrame was created before `woodwork` was imported, or if you're trying to access Woodwork properties without initialization, call `df.ww.init()` first. -
ValueError: Invalid LogicalType specified for column '...' with value '...'
cause Attempting to assign a logical type that is incompatible with the underlying data type of the column, or using an unrecognized logical type string.fixVerify that the data in the column can be correctly interpreted by the specified logical type. Check the `woodwork.logical_types` module for valid logical type names. For example, 'Double' for floats, 'Integer' for integers, 'Categorical' for strings/categories. -
TypeError: Cannot infer LogicalType from '...'
cause Woodwork failed to automatically determine a suitable logical type for a column based on its data.fixManually specify the logical type for the problematic column using `df.ww.set_types(logical_types={'column_name': 'YourLogicalType'})` after calling `df.ww.init()`.
Warnings
- breaking The global `woodwork.init()` function was removed and replaced by the DataFrame accessor method `df.ww.init()`.
- breaking The default value for `infer_box_type_on_init` parameter in `df.ww.init()` changed from `True` to `False`.
- breaking Support for Python 3.8 was dropped.
- deprecated The `WoodworkColumnAccessor.set_logical_type()` method was deprecated.
Install
-
pip install woodwork
Imports
- DataFrameAccessor
import woodwork as ww # ... then use df.ww.method()
- init
from woodwork.api import init init(df)
df.ww.init()
- LogicalType
from woodwork.logical_types import LogicalType
- Categorical
from woodwork.logical_types import Categorical
Quickstart
import pandas as pd
import woodwork as ww
data = {
"id": [1, 2, 3],
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35],
"email": ["alice@example.com", "bob@example.com", "charlie@example.com"]
}
df = pd.DataFrame(data)
# Initialize Woodwork on the DataFrame to infer types and create a schema
df.ww.init()
print(df.ww.schema)
print(df.ww.logical_types)
print(df.ww['email'].ww.logical_type)