Bayesian Optimization (Optuna/Hyperopt)
Bayesian optimization treats hyperparameter tuning as a sequential decision problem — it builds a probabilistic model of the objective function and uses it to choose the most promising configurations to try next, converging to good solutions much faster than blind search.
How Bayesian Optimization Works
A surrogate model (commonly a Gaussian Process or Tree Parzen Estimator) approximates the unknown objective function. An acquisition function then balances exploration and exploitation to select the next trial point.
Tree Parzen Estimator (TPE)
Optuna and Hyperopt default to TPE, which models p(hyperparams | good results) and p(hyperparams | bad results) separately, then samples from the ratio. This makes it highly effective on discrete and conditional search spaces without the cubic cost of Gaussian Processes.
Hyperparameter Tuning with Optuna
Optuna uses a define-by-run API: you write a Python function that defines the search space and returns a score, and Optuna handles the optimization loop.
Optuna with Scikit-Learn
Pruning Unpromising Trials
Comparing Tuning Strategies
For a fixed budget of evaluations, Bayesian optimization typically outperforms both grid and random search by focusing evaluations in high-performance regions of the space.
When to Choose Bayesian Optimization
- Each model evaluation is expensive (deep learning, large datasets)
- The search space is continuous or high-dimensional
- You can afford only 50–200 total trials and need the most from them
- Use grid/random search for cheap models where parallelism compensates for naïve sampling