Tabmat
Tabmat provides efficient matrix representations for working with tabular data, designed to integrate seamlessly with various dataframe libraries. It offers specialized matrix types like DenseMatrix, CategoricalMatrix, and SplitMatrix for performance-critical statistical and machine learning tasks, especially useful for generalized linear models. The current version is 4.2.1, with an active development pace and frequent releases addressing bug fixes and new features.
Common errors
-
AttributeError: 'DenseMatrix' object has no attribute 'A'
cause Attempting to access the underlying NumPy array using the `.A` attribute, which was removed in tabmat v4.0.0 when `DenseMatrix` stopped inheriting from `np.ndarray`.fixUse `dense_matrix.unpack()` or `dense_matrix.toarray()` to get the underlying NumPy array for direct array manipulation. -
TypeError: can't convert DenseMatrix to numpy.ndarray implicitly
cause Trying to pass a `DenseMatrix` directly where a `numpy.ndarray` is expected, due to the breaking change in tabmat v4.0.0 that removed direct inheritance from `np.ndarray`.fixExplicitly convert the `DenseMatrix` to a NumPy array using `dense_matrix.unpack()` or `dense_matrix.toarray()` before passing it to functions expecting a `np.ndarray`. -
ERROR: Package 'tabmat' requires Python '>=3.10' but the running Python is 3.X.Y
cause Attempting to install or use tabmat version 4.2.0 or higher with an incompatible Python version (older than 3.10).fixUpgrade your Python environment to 3.10 or newer. If an upgrade is not possible, install an older version of tabmat: `pip install "tabmat<4.2.0"`. -
RuntimeError: buffer source array is read-only
cause Certain `CategoricalMatrix` methods or related internal operations in older tabmat versions were called with an immutable (read-only) NumPy array or buffer, which they attempted to modify.fixUpgrade to tabmat version 4.2.1 or newer, which includes fixes for operating on read-only buffers. If an upgrade is not possible, ensure any input arrays are writable, e.g., by creating a copy: `my_array.copy(order='C')`.
Warnings
- breaking As of v4.0.0, `DenseMatrix` and `SparseMatrix` no longer inherit from `numpy.ndarray` and `scipy.sparse.csc_matrix` respectively. Direct array-like access (e.g., `.A`) or implicit conversion will now fail.
- breaking Tabmat v4.2.0 and later require Python 3.10 or newer. Installation via `pip` will fail with an incompatibility error on older Python versions.
- gotcha Methods of `CategoricalMatrix` and related internal functions in versions prior to 4.2.1/4.1.3 might raise a `RuntimeError` when operating on read-only buffers (e.g., NumPy arrays with `writeable=False`).
- gotcha `tabmat.from_df` and `tabmat.from_formula` now use `narwhals`' v2 API and support a wider range of dataframes (including `polars`). While this enhances compatibility, users should be aware of potential subtle behavioral changes if they were relying on specific `pandas` dataframe quirks or older `narwhals` API behavior.
Install
-
pip install tabmat
Imports
- from_df
from tabmat import from_df
- from_formula
from tabmat import from_formula
- DenseMatrix
import tabmat.DenseMatrix
from tabmat import DenseMatrix
- CategoricalMatrix
from tabmat import CategoricalMatrix
- SplitMatrix
from tabmat import SplitMatrix
Quickstart
import pandas as pd
import tabmat as tm
import numpy as np
# Create a sample DataFrame
df = pd.DataFrame({"numeric_col": [1, 2, 3, 4],
"categorical_col": ["A", "B", "A", "C"],
"bool_col": [True, False, True, False]})
# Create a SplitMatrix from the DataFrame, standardizing numeric columns
# and dropping the first level for categorical encoding
matrix = tm.from_df(df, standardize=True, drop_first=True)
print(f"Matrix shape: {matrix.shape}")
print(f"Matrix parts (e.g., DenseMatrix, CategoricalMatrix): {matrix.matrices}")
# Example of matrix-vector multiplication
vec = np.random.rand(matrix.shape[1])
result = matrix.matvec(vec)
print(f"Result of matvec (first 5 elements): {result[:5]}")