MABWiser
raw JSON → 2.7.4 verified Fri May 01 auth: no python
MABWiser is a Python library for parallelizable, contextual multi-armed bandits. It supports a wide range of bandit learning policies (e.g., epsilon-greedy, Thompson Sampling, LinUCB) and neighborhood policies for contextual bandits. Version 2.7.4 is the latest release, with active development.
pip install mabwiser Common errors
error AttributeError: module 'mabwiser' has no attribute 'MAB' ↓
cause Importing MAB from the top-level package instead of the mab submodule.
fix
Use: from mabwiser.mab import MAB
error TypeError: 'LearningPolicy' object is not callable ↓
cause Using LearningPolicy as a function with a string argument instead of using the enum attribute directly.
fix
Use: LearningPolicy.EpsilonGreedy(epsilon=0.1)
error ValueError: The truth value of an array with more than one element is ambiguous ↓
cause Passing a full context matrix to predict() for a non-contextual bandit.
fix
For non-contextual, call predict() with no arguments or an empty array.
Warnings
breaking In version 2.4.0, the scaler argument changed from a pre-trained scaler dict to a boolean `scale` flag. Code using `arm_to_scaler` will break. ↓
fix Replace `arm_to_scaler` argument with `scale=True` and let MABWiser fit scalers internally.
breaking np.Inf removed in 2.7.4. Code referencing `np.Inf` will raise AttributeError. ↓
fix Replace `np.Inf` with `np.inf`.
deprecated Direct use of `LearningPolicy` strings (e.g., `LearningPolicy('epsilon_greedy')`) is deprecated in favor of the enum-like `LearningPolicy.EpsilonGreedy` objects. ↓
fix Use `LearningPolicy.EpsilonGreedy(epsilon=0.1)` instead of `LearningPolicy('epsilon_greedy')`.
gotcha MAB.predict() for non-contextual policies expects an empty or None context array when using recent versions (>=2.4.0). Providing a non-empty context will raise an error. ↓
fix Call `mab.predict()` without arguments or pass an empty array like `np.array([[]])` for batch predictions.
Imports
- MAB wrong
from mabwiser import MABcorrectfrom mabwiser.mab import MAB - LearningPolicy
from mabwiser.mab import LearningPolicy - NeighborhoodPolicy
from mabwiser.mab import NeighborhoodPolicy
Quickstart
import numpy as np
from mabwiser.mab import MAB, LearningPolicy, NeighborhoodPolicy
# Non-contextual bandit
arms = ['arm1', 'arm2']
mab = MAB(arms, LearningPolicy.EpsilonGreedy(epsilon=0.1))
# Simulate fitting: use dummy rewards
for _ in range(100):
arm = mab.predict()
reward = np.random.binomial(1, 0.7 if arm == 'arm1' else 0.3)
mab.partial_fit(arm, reward)
# Contextual bandit with nearest neighbor
contexts = np.array([[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]])
mab_ctx = MAB(arms, LearningPolicy.EpsilonGreedy(epsilon=0.1), NeighborhoodPolicy.Cluster())
mab_ctx.fit(contexts, np.array(['arm1', 'arm2', 'arm1']), np.array([1, 0, 1]))
print(mab_ctx.predict(contexts[-1:]))