XGBoost: Advanced Gradient Boosting
XGBoost (eXtreme Gradient Boosting) extends gradient boosting with L1/L2 regularization, second-order Taylor expansion for splits, parallel tree construction, and built-in handling of missing values.
XGBoost Innovations
Unlike vanilla GBM, XGBoost uses both the first (gradient) and second (Hessian) derivatives of the loss to compute optimal leaf weights and split gains, leading to better-calibrated trees.
Regularized Objective
XGBoost minimizes: L(\u03a6) = \u03a3 l(y_i, \u0177_i) + \u03a3_k \u03a9(f_k) where \u03a9(f) = \u03b3T + 0.5\u03bb||w||\u00b2 + \u03b1||w||\u2081. Here T is the number of leaves, \u03b3 penalizes tree complexity, \u03bb is L2 weight regularization, and \u03b1 is L1 weight regularization.
Training XGBoost with scikit-learn API
XGBoost provides a scikit-learn-compatible API through XGBClassifier and XGBRegressor, supporting early stopping and cross-validation.
Basic XGBoost Training
Early Stopping
Early stopping monitors a validation metric and stops training when no improvement occurs for early_stopping_rounds consecutive rounds. This prevents overfitting without manually tuning n_estimators.
Performance and Scalability
XGBoost uses column-block data structures, out-of-core computation for large datasets, and parallel processing across CPUs and GPUs, making it one of the fastest gradient boosting implementations.