Formulaic (Python for Wilkinson Formulas)

1.2.1 · active · verified Thu Apr 09

Formulaic is a high-performance Python library that implements Wilkinson formulas for statistical modeling. It simplifies feature engineering by providing extensible formula parsing and high-performance dataframe to model-matrix conversions. It supports various data input/output formats including pandas DataFrames, NumPy arrays, SciPy sparse matrices, and Narwhals dataframes. The library is actively maintained and currently at version 1.2.1.

Warnings

Install

Imports

Quickstart

This example demonstrates how to create design matrices using Wilkinson formulas from a pandas DataFrame. It shows both the explicit `Formula` class approach and the `model_matrix` shorthand function.

import pandas
from formulaic import Formula, model_matrix

df = pandas.DataFrame({
    'y': [0, 1, 2],
    'x': ['A', 'B', 'C'],
    'z': [0.3, 0.1, 0.2],
})

print("Using Formula class (recommended for advanced use/reuse):")
f = Formula('y ~ x + z')
y_formula, X_formula = f.get_model_matrix(df)
print("Response (y) from Formula:\n", y_formula)
print("Design Matrix (X) from Formula:\n", X_formula)

print("\nUsing model_matrix shorthand:")
y_short, X_short = model_matrix('y ~ x + z', df)
print("Response (y) from model_matrix:\n", y_short)
print("Design Matrix (X) from model_matrix:\n", X_short)

view raw JSON →