Gradient Boosting Machines (GBM)

Gradient Boosting Machines frame boosting as a gradient descent in function space, fitting each new tree to the residual errors (negative gradients) of the current ensemble.


The Gradient Boosting Framework

Given a differentiable loss function L(y, F(x)), GBM iteratively adds trees h_t that best fit the negative gradient: r_i = -[\u2202L / \u2202F(x_i)], called pseudo-residuals.

Residual Fitting Intuition

For squared-error loss, the pseudo-residuals are simply y_i - F_{t-1}(x_i) — the actual residuals. Each new tree predicts how much the current ensemble is wrong, and its scaled prediction is added to reduce the total loss.

GBM Algorithm Outline

  1. Initialize F_0(x) = argmin_\u03b3 \u03a3 L(y_i, \u03b3) (constant prediction).
  2. For t = 1 to M: compute pseudo-residuals, fit a regression tree to them, find optimal leaf values, update F_t = F_{t-1} + \u03b7 \u00b7 h_t.
  3. Return final ensemble F_M(x).

scikit-learn GradientBoostingClassifier

scikit-learn provides GradientBoostingClassifier and GradientBoostingRegressor with support for multiple loss functions and built-in subsampling for regularization.

Training and Tuning

<pre><code class="language-python">from sklearn.ensemble import GradientBoostingClassifier from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split X, y = load_breast_cancer(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) gbm = GradientBoostingClassifier( n_estimators=300, learning_rate=0.05, max_depth=3, subsample=0.8, # stochastic gradient boosting min_samples_leaf=5, random_state=42 ) gbm.fit(X_train, y_train) print(f"Test Accuracy: {gbm.score(X_test, y_test):.3f}")</pre>

HistGradientBoostingClassifier

<pre><code class="language-python">from sklearn.ensemble import HistGradientBoostingClassifier # Much faster for large datasets (histogram-based, like LightGBM) hgbm = HistGradientBoostingClassifier( max_iter=300, learning_rate=0.05, max_depth=4, random_state=42 ) hgbm.fit(X_train, y_train) print(f"HistGBM Test Accuracy: {hgbm.score(X_test, y_test):.3f}")</pre>

Regularization Strategies

GBMs are prone to overfitting with too many trees or too high a learning rate. Regularization options include shrinkage, subsampling, tree depth, and min_samples_leaf.

Key Regularization Levers

  • learning_rate: Lower \u2192 more iterations needed, better generalization.
  • subsample: &lt; 1.0 introduces stochasticity, reduces overfitting.
  • max_depth: Shallower trees (3–5) prevent complex interactions.
  • min_samples_leaf: Prevents leaves with very few samples.