Gradient Boosting Machines (GBM)
Gradient Boosting Machines frame boosting as a gradient descent in function space, fitting each new tree to the residual errors (negative gradients) of the current ensemble.
The Gradient Boosting Framework
Given a differentiable loss function L(y, F(x)), GBM iteratively adds trees h_t that best fit the negative gradient: r_i = -[\u2202L / \u2202F(x_i)], called pseudo-residuals.
Residual Fitting Intuition
For squared-error loss, the pseudo-residuals are simply y_i - F_{t-1}(x_i) — the actual residuals. Each new tree predicts how much the current ensemble is wrong, and its scaled prediction is added to reduce the total loss.
GBM Algorithm Outline
- Initialize
F_0(x) = argmin_\u03b3 \u03a3 L(y_i, \u03b3)(constant prediction). - For t = 1 to M: compute pseudo-residuals, fit a regression tree to them, find optimal leaf values, update
F_t = F_{t-1} + \u03b7 \u00b7 h_t. - Return final ensemble
F_M(x).
scikit-learn GradientBoostingClassifier
scikit-learn provides GradientBoostingClassifier and GradientBoostingRegressor with support for multiple loss functions and built-in subsampling for regularization.
Training and Tuning
HistGradientBoostingClassifier
Regularization Strategies
GBMs are prone to overfitting with too many trees or too high a learning rate. Regularization options include shrinkage, subsampling, tree depth, and min_samples_leaf.
Key Regularization Levers
- learning_rate: Lower \u2192 more iterations needed, better generalization.
- subsample: < 1.0 introduces stochasticity, reduces overfitting.
- max_depth: Shallower trees (3–5) prevent complex interactions.
- min_samples_leaf: Prevents leaves with very few samples.