Tabmat

4.2.1 · active · verified Fri Apr 17

Tabmat provides efficient matrix representations for working with tabular data, designed to integrate seamlessly with various dataframe libraries. It offers specialized matrix types like DenseMatrix, CategoricalMatrix, and SplitMatrix for performance-critical statistical and machine learning tasks, especially useful for generalized linear models. The current version is 4.2.1, with an active development pace and frequent releases addressing bug fixes and new features.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create a `SplitMatrix` from a pandas DataFrame using `tabmat.from_df`. It automatically handles different column types, applying standardization and one-hot encoding as specified. The example then shows how to perform a matrix-vector multiplication, a common operation for `tabmat` objects.

import pandas as pd
import tabmat as tm
import numpy as np

# Create a sample DataFrame
df = pd.DataFrame({"numeric_col": [1, 2, 3, 4],
                   "categorical_col": ["A", "B", "A", "C"],
                   "bool_col": [True, False, True, False]})

# Create a SplitMatrix from the DataFrame, standardizing numeric columns
# and dropping the first level for categorical encoding
matrix = tm.from_df(df, standardize=True, drop_first=True)

print(f"Matrix shape: {matrix.shape}")
print(f"Matrix parts (e.g., DenseMatrix, CategoricalMatrix): {matrix.matrices}")

# Example of matrix-vector multiplication
vec = np.random.rand(matrix.shape[1])
result = matrix.matvec(vec)
print(f"Result of matvec (first 5 elements): {result[:5]}")

view raw JSON →