Introduction to Ensemble Learning

Ensemble learning combines predictions from multiple base models so that their collective output is more accurate and robust than any individual learner.


Why Ensembles Work

Individual models make different errors. When errors are uncorrelated, averaging predictions cancels out noise — this is the wisdom of crowds effect formalized in machine learning.

Bias-Variance Decomposition View

Averaging M independent models each with variance \u03c3\u00b2 reduces ensemble variance to \u03c3\u00b2 / M (assuming no correlation). In practice, models are correlated, so bagging and feature randomization are used to reduce correlation and still achieve meaningful variance reduction.

Three Pillars: Bagging, Boosting, Stacking

The three main ensemble paradigms differ in how they build and combine base models.

Overview of Each Strategy

  • Bagging: Trains models in parallel on bootstrap samples; averages predictions (e.g., Random Forests).
  • Boosting: Trains models sequentially, each correcting the previous model's errors (e.g., AdaBoost, XGBoost).
  • Stacking: Uses a meta-learner to combine diverse base model predictions.

Quick Demo: VotingClassifier

<pre><code class="language-python">from sklearn.ensemble import VotingClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.neighbors import KNeighborsClassifier from sklearn.linear_model import LogisticRegression from sklearn.datasets import load_iris from sklearn.model_selection import cross_val_score X, y = load_iris(return_X_y=True) ensemble = VotingClassifier(estimators=[ ('dt', DecisionTreeClassifier(max_depth=4)), ('knn', KNeighborsClassifier(n_neighbors=5)), ('lr', LogisticRegression(max_iter=200)) ], voting='soft') scores = cross_val_score(ensemble, X, y, cv=5) print(f"Ensemble CV Accuracy: {scores.mean():.3f}")</pre>

Diversity: The Secret Ingredient

Ensembles only improve over single models when base learners are diverse (make different mistakes). Diversity is introduced via different data subsets, different features, or different algorithms.

Promoting Diversity

Use bootstrap sampling, random feature subsets, or entirely different model families. Highly correlated base learners provide little benefit — measuring pairwise prediction correlation can help diagnose poor ensemble design.