Introduction to Ensemble Learning
Ensemble learning combines predictions from multiple base models so that their collective output is more accurate and robust than any individual learner.
Why Ensembles Work
Individual models make different errors. When errors are uncorrelated, averaging predictions cancels out noise — this is the wisdom of crowds effect formalized in machine learning.
Bias-Variance Decomposition View
Averaging M independent models each with variance \u03c3\u00b2 reduces ensemble variance to \u03c3\u00b2 / M (assuming no correlation). In practice, models are correlated, so bagging and feature randomization are used to reduce correlation and still achieve meaningful variance reduction.
Three Pillars: Bagging, Boosting, Stacking
The three main ensemble paradigms differ in how they build and combine base models.
Overview of Each Strategy
- Bagging: Trains models in parallel on bootstrap samples; averages predictions (e.g., Random Forests).
- Boosting: Trains models sequentially, each correcting the previous model's errors (e.g., AdaBoost, XGBoost).
- Stacking: Uses a meta-learner to combine diverse base model predictions.
Quick Demo: VotingClassifier
Diversity: The Secret Ingredient
Ensembles only improve over single models when base learners are diverse (make different mistakes). Diversity is introduced via different data subsets, different features, or different algorithms.
Promoting Diversity
Use bootstrap sampling, random feature subsets, or entirely different model families. Highly correlated base learners provide little benefit — measuring pairwise prediction correlation can help diagnose poor ensemble design.