Sampling from Distributions
Unlike grid search, randomized search accepts continuous probability distributions (from scipy.stats) instead of fixed lists, allowing finer coverage of the search space.
Basic Usage with Distributions
<pre><code class="language-python">from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV
from sklearn.datasets import load_breast_cancer
from scipy.stats import randint, uniform
X, y = load_breast_cancer(return_X_y=True)
param_dist = {
"n_estimators": randint(50, 500),
"max_depth": randint(2, 20),
"max_features": uniform(0.1, 0.9),
"min_samples_split":randint(2, 20),
"bootstrap": [True, False]
}
rand_search = RandomizedSearchCV(
estimator=RandomForestClassifier(random_state=42),
param_distributions=param_dist,
n_iter=100, # number of random samples
cv=5,
scoring="roc_auc",
n_jobs=-1,
random_state=42,
verbose=1
)
rand_search.fit(X, y)
print(rand_search.best_params_)
print(rand_search.best_score_)</pre>
Efficiency vs. Exhaustiveness
Research by Bergstra & Bengio (2012) shows that random search finds equally good configurations as grid search while evaluating far fewer points, because most hyperparameters have only a small region of strong performance.
Choosing n_iter
A practical rule of thumb: start with n_iter=50 for a budget-constrained run and increase to 200+ for more thorough exploration. Always set random_state for reproducibility. The total compute cost is simply n_iter × cv model fits.
Comparing Grid vs. Random Search
<pre><code class="language-python">from sklearn.model_selection import GridSearchCV
# Grid: 4 x 5 x 3 = 60 combinations x 5 folds = 300 fits
param_grid = {"n_estimators": [50, 100, 200, 400],
"max_depth": [2, 5, 10, 15, 20],
"max_features": ["sqrt", "log2", None]}
# Random: only 50 fits (5-fold), much faster
# Use RandomizedSearchCV with n_iter=10 for same total fits</pre>
Post-Search Analysis
The cv_results_ attribute gives the full sampling history, useful for visualising which hyperparameter ranges performed best.
Visualising the Search
<pre><code class="language-python">import pandas as pd
import matplotlib.pyplot as plt
results = pd.DataFrame(rand_search.cv_results_)
results["n_estimators"] = results["param_n_estimators"].astype(int)
plt.scatter(results["n_estimators"], results["mean_test_score"], alpha=0.5)
plt.xlabel("n_estimators")
plt.ylabel("Mean ROC-AUC (CV)")
plt.title("Random Search: n_estimators vs. Score")
plt.tight_layout()
plt.show()</pre>