{"id":2036,"library":"formulaic","title":"Formulaic (Python for Wilkinson Formulas)","description":"Formulaic is a high-performance Python library that implements Wilkinson formulas for statistical modeling. It simplifies feature engineering by providing extensible formula parsing and high-performance dataframe to model-matrix conversions. It supports various data input/output formats including pandas DataFrames, NumPy arrays, SciPy sparse matrices, and Narwhals dataframes. The library is actively maintained and currently at version 1.2.1.","status":"active","version":"1.2.1","language":"en","source_language":"en","source_url":"https://github.com/matthewwardrop/formulaic","tags":["scientific computing","data transformation","statistical modeling","feature engineering","formulas","Wilkinson formulas"],"install":[{"cmd":"pip install formulaic","lang":"bash","label":"Install with pip"}],"dependencies":[{"reason":"Commonly used for input dataframes and highly integrated into examples.","package":"pandas","optional":true},{"reason":"Used for numerical operations and an output format option.","package":"numpy","optional":true},{"reason":"Used for sparse matrix output option.","package":"scipy","optional":true}],"imports":[{"note":"Use this for explicit formula object creation, allowing for inspection and reuse of the model specification.","symbol":"Formula","correct":"from formulaic import Formula"},{"note":"A convenience function for one-off model matrix generation.","symbol":"model_matrix","correct":"from formulaic import model_matrix"}],"quickstart":{"code":"import pandas\nfrom formulaic import Formula, model_matrix\n\ndf = pandas.DataFrame({\n    'y': [0, 1, 2],\n    'x': ['A', 'B', 'C'],\n    'z': [0.3, 0.1, 0.2],\n})\n\nprint(\"Using Formula class (recommended for advanced use/reuse):\")\nf = Formula('y ~ x + z')\ny_formula, X_formula = f.get_model_matrix(df)\nprint(\"Response (y) from Formula:\\n\", y_formula)\nprint(\"Design Matrix (X) from Formula:\\n\", X_formula)\n\nprint(\"\\nUsing model_matrix shorthand:\")\ny_short, X_short = model_matrix('y ~ x + z', df)\nprint(\"Response (y) from model_matrix:\\n\", y_short)\nprint(\"Design Matrix (X) from model_matrix:\\n\", X_short)","lang":"python","description":"This example demonstrates how to create design matrices using Wilkinson formulas from a pandas DataFrame. It shows both the explicit `Formula` class approach and the `model_matrix` shorthand function."},"warnings":[{"fix":"To remove the intercept, use `y ~ -1 + x + z`. To specify different contrasts, refer to the official documentation on 'Contrasts'.","message":"Formulaic, following Wilkinson formula conventions, automatically adds an intercept term (unless explicitly removed) and typically uses treatment coding for categorical variables by default. This might differ from expectations if coming from other statistical packages or manual feature engineering methods.","severity":"gotcha","affected_versions":"All 1.x.x versions"},{"fix":"For complex workflows or production, prefer `f = Formula('...'); y, X = f.get_model_matrix(df)` and save/reuse `f` or its underlying `ModelSpec`.","message":"While `model_matrix` provides a convenient shorthand, direct use of `Formula('...').get_model_matrix()` is recommended for scenarios where you need to inspect the compiled formula structure, or reuse the generated `ModelSpec` to ensure consistent transformations across multiple datasets (e.g., training and testing data).","severity":"gotcha","affected_versions":"All 1.x.x versions"},{"fix":"Ensure input data is a `pandas.DataFrame` where possible, and column data types are appropriate for the desired transformations (e.g., numeric for arithmetic operations, string/category for categorical encoding).","message":"Formulaic is optimized for working with tabular data, most commonly `pandas.DataFrame` for input. While it supports other data structures (NumPy arrays, SciPy sparse matrices, Narwhals dataframes), inconsistencies in input formats or unexpected data types within columns can lead to errors.","severity":"gotcha","affected_versions":"All 1.x.x versions"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}