MABWiser

raw JSON →
2.7.4 verified Fri May 01 auth: no python

MABWiser is a Python library for parallelizable, contextual multi-armed bandits. It supports a wide range of bandit learning policies (e.g., epsilon-greedy, Thompson Sampling, LinUCB) and neighborhood policies for contextual bandits. Version 2.7.4 is the latest release, with active development.

pip install mabwiser
error AttributeError: module 'mabwiser' has no attribute 'MAB'
cause Importing MAB from the top-level package instead of the mab submodule.
fix
Use: from mabwiser.mab import MAB
error TypeError: 'LearningPolicy' object is not callable
cause Using LearningPolicy as a function with a string argument instead of using the enum attribute directly.
fix
Use: LearningPolicy.EpsilonGreedy(epsilon=0.1)
error ValueError: The truth value of an array with more than one element is ambiguous
cause Passing a full context matrix to predict() for a non-contextual bandit.
fix
For non-contextual, call predict() with no arguments or an empty array.
breaking In version 2.4.0, the scaler argument changed from a pre-trained scaler dict to a boolean `scale` flag. Code using `arm_to_scaler` will break.
fix Replace `arm_to_scaler` argument with `scale=True` and let MABWiser fit scalers internally.
breaking np.Inf removed in 2.7.4. Code referencing `np.Inf` will raise AttributeError.
fix Replace `np.Inf` with `np.inf`.
deprecated Direct use of `LearningPolicy` strings (e.g., `LearningPolicy('epsilon_greedy')`) is deprecated in favor of the enum-like `LearningPolicy.EpsilonGreedy` objects.
fix Use `LearningPolicy.EpsilonGreedy(epsilon=0.1)` instead of `LearningPolicy('epsilon_greedy')`.
gotcha MAB.predict() for non-contextual policies expects an empty or None context array when using recent versions (>=2.4.0). Providing a non-empty context will raise an error.
fix Call `mab.predict()` without arguments or pass an empty array like `np.array([[]])` for batch predictions.

Minimal example: non-contextual epsilon-greedy bandit with partial_fit, and contextual bandit with Cluster neighborhood.

import numpy as np
from mabwiser.mab import MAB, LearningPolicy, NeighborhoodPolicy

# Non-contextual bandit
arms = ['arm1', 'arm2']
mab = MAB(arms, LearningPolicy.EpsilonGreedy(epsilon=0.1))
# Simulate fitting: use dummy rewards
for _ in range(100):
    arm = mab.predict()
    reward = np.random.binomial(1, 0.7 if arm == 'arm1' else 0.3)
    mab.partial_fit(arm, reward)

# Contextual bandit with nearest neighbor
contexts = np.array([[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]])
mab_ctx = MAB(arms, LearningPolicy.EpsilonGreedy(epsilon=0.1), NeighborhoodPolicy.Cluster())
mab_ctx.fit(contexts, np.array(['arm1', 'arm2', 'arm1']), np.array([1, 0, 1]))
print(mab_ctx.predict(contexts[-1:]))