Hyperopt
Hyperopt is a Python library for distributed asynchronous hyperparameter optimization, enabling optimization over awkward search spaces including real-valued, discrete, and conditional dimensions. It uses Bayesian optimization algorithms like Tree of Parzen Estimators (TPE) and Random Search to efficiently find optimal hyperparameters for machine learning models. The current PyPI version is 0.2.7.
Warnings
- breaking The open-source version of Hyperopt is no longer being actively maintained. Databricks, a notable user, explicitly states this and recommends alternatives like Optuna for single-node optimization or Ray Tune for distributed tuning.
- gotcha When using `hp.choice()` for categorical parameters, Hyperopt stores and returns the *index* of the chosen option from the list, not the actual value.
- gotcha A reported loss of NaN (not a number) often indicates that the objective function passed to `fmin()` returned `NaN`. This can lead to unexpected optimization behavior.
- gotcha When using `SparkTrials` for distributed hyperparameter tuning, Hyperopt determines parallelism at the start. It will *not* dynamically adapt to changes in cluster size if the cluster autoscales.
- gotcha Due to its use of stochastic search algorithms, Hyperopt's reported loss does not necessarily decrease monotonically with each evaluation. This is expected behavior and does not indicate an issue.
Install
-
pip install hyperopt
Imports
- fmin
from hyperopt import fmin
- tpe
from hyperopt import tpe
- hp
from hyperopt import hp
- Trials
from hyperopt import Trials
Quickstart
import numpy as np
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
# 1. Define the objective function to minimize
def objective(args):
x, y = args
return {'loss': x ** 2 + y ** 2, 'status': STATUS_OK}
# 2. Define the search space
space = [
hp.uniform('x', -10, 10),
hp.uniform('y', -10, 10)
]
# 3. Create a Trials object to store results
trials = Trials()
# 4. Run the optimization
best = fmin(objective, space, algo=tpe.suggest, max_evals=100, trials=trials)
print("Best parameters found:", best)
print("Best loss found:", trials.best_trial['result']['loss'])