Bayesian Optimization (Optuna/Hyperopt)

Bayesian optimization treats hyperparameter tuning as a sequential decision problem — it builds a probabilistic model of the objective function and uses it to choose the most promising configurations to try next, converging to good solutions much faster than blind search.


How Bayesian Optimization Works

A surrogate model (commonly a Gaussian Process or Tree Parzen Estimator) approximates the unknown objective function. An acquisition function then balances exploration and exploitation to select the next trial point.

Tree Parzen Estimator (TPE)

Optuna and Hyperopt default to TPE, which models p(hyperparams | good results) and p(hyperparams | bad results) separately, then samples from the ratio. This makes it highly effective on discrete and conditional search spaces without the cubic cost of Gaussian Processes.

Hyperparameter Tuning with Optuna

Optuna uses a define-by-run API: you write a Python function that defines the search space and returns a score, and Optuna handles the optimization loop.

Optuna with Scikit-Learn

<pre><code class="language-python">import optuna from sklearn.ensemble import GradientBoostingClassifier from sklearn.model_selection import cross_val_score from sklearn.datasets import load_breast_cancer X, y = load_breast_cancer(return_X_y=True) def objective(trial): params = { "n_estimators": trial.suggest_int("n_estimators", 50, 400), "learning_rate": trial.suggest_float("learning_rate", 1e-3, 0.3, log=True), "max_depth": trial.suggest_int("max_depth", 2, 8), "subsample": trial.suggest_float("subsample", 0.5, 1.0), } model = GradientBoostingClassifier(**params, random_state=42) scores = cross_val_score(model, X, y, cv=5, scoring="roc_auc") return scores.mean() study = optuna.create_study(direction="maximize") study.optimize(objective, n_trials=50, show_progress_bar=True) print("Best params:", study.best_params) print("Best ROC-AUC:", study.best_value)</pre>

Pruning Unpromising Trials

<pre><code class="language-python">from optuna.integration import SklearnPruningCallback # Optuna can prune trials mid-training using built-in callbacks # enabling early stopping for iterative estimators like XGBoost or LightGBM # Visualise the optimisation history optuna.visualization.plot_optimization_history(study).show() optuna.visualization.plot_param_importances(study).show()</pre>

Comparing Tuning Strategies

For a fixed budget of evaluations, Bayesian optimization typically outperforms both grid and random search by focusing evaluations in high-performance regions of the space.

When to Choose Bayesian Optimization

  • Each model evaluation is expensive (deep learning, large datasets)
  • The search space is continuous or high-dimensional
  • You can afford only 50–200 total trials and need the most from them
  • Use grid/random search for cheap models where parallelism compensates for naïve sampling