MABWiser

2.7.4 verified Fri May 01 auth: no python

MABWiser is a Python library for parallelizable, contextual multi-armed bandits. It supports a wide range of bandit learning policies (e.g., epsilon-greedy, Thompson Sampling, LinUCB) and neighborhood policies for contextual bandits. Version 2.7.4 is the latest release, with active development.

pip install mabwiser

Common errors

error AttributeError: module 'mabwiser' has no attribute 'MAB' ↓

cause Importing MAB from the top-level package instead of the mab submodule.

fix

Use: from mabwiser.mab import MAB

error TypeError: 'LearningPolicy' object is not callable ↓

cause Using LearningPolicy as a function with a string argument instead of using the enum attribute directly.

fix

Use: LearningPolicy.EpsilonGreedy(epsilon=0.1)

error ValueError: The truth value of an array with more than one element is ambiguous ↓

cause Passing a full context matrix to predict() for a non-contextual bandit.

fix

For non-contextual, call predict() with no arguments or an empty array.

Warnings

breaking In version 2.4.0, the scaler argument changed from a pre-trained scaler dict to a boolean `scale` flag. Code using `arm_to_scaler` will break. ↓

fix Replace `arm_to_scaler` argument with `scale=True` and let MABWiser fit scalers internally.

breaking np.Inf removed in 2.7.4. Code referencing `np.Inf` will raise AttributeError. ↓

fix Replace `np.Inf` with `np.inf`.

deprecated Direct use of `LearningPolicy` strings (e.g., `LearningPolicy('epsilon_greedy')`) is deprecated in favor of the enum-like `LearningPolicy.EpsilonGreedy` objects. ↓

fix Use `LearningPolicy.EpsilonGreedy(epsilon=0.1)` instead of `LearningPolicy('epsilon_greedy')`.

gotcha MAB.predict() for non-contextual policies expects an empty or None context array when using recent versions (>=2.4.0). Providing a non-empty context will raise an error. ↓

fix Call `mab.predict()` without arguments or pass an empty array like `np.array([[]])` for batch predictions.

Imports

MAB
wrong
```
from mabwiser import MAB
```
correct
```
from mabwiser.mab import MAB
```
MAB is not exposed at the package level; it's in the mab submodule.
LearningPolicy
```
from mabwiser.mab import LearningPolicy
```
Commonly used to specify learning policies like LearningPolicy.EpsilonGreedy.
NeighborhoodPolicy
```
from mabwiser.mab import NeighborhoodPolicy
```
Used for contextual bandits with nearest neighbor policies.

Quickstart

Minimal example: non-contextual epsilon-greedy bandit with partial_fit, and contextual bandit with Cluster neighborhood.

import numpy as np
from mabwiser.mab import MAB, LearningPolicy, NeighborhoodPolicy

# Non-contextual bandit
arms = ['arm1', 'arm2']
mab = MAB(arms, LearningPolicy.EpsilonGreedy(epsilon=0.1))
# Simulate fitting: use dummy rewards
for _ in range(100):
    arm = mab.predict()
    reward = np.random.binomial(1, 0.7 if arm == 'arm1' else 0.3)
    mab.partial_fit(arm, reward)

# Contextual bandit with nearest neighbor
contexts = np.array([[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]])
mab_ctx = MAB(arms, LearningPolicy.EpsilonGreedy(epsilon=0.1), NeighborhoodPolicy.Cluster())
mab_ctx.fit(contexts, np.array(['arm1', 'arm2', 'arm1']), np.array([1, 0, 1]))
print(mab_ctx.predict(contexts[-1:]))