Regularization Overview
Regularisation is the technique of adding a penalty to the loss function that discourages large coefficients, effectively limiting a model's freedom to overfit training data.
The Regularisation Idea
Instead of minimising only the training error, a regularised model minimises Loss + \u03bb \u00d7 Penalty(\u03b2). The hyperparameter \u03bb controls the strength of regularisation: larger \u03bb shrinks coefficients more aggressively.
Why Regularisation Works
Overfitting produces large, erratic coefficient values. Penalising large coefficients forces the model to spread predictive power evenly across features instead of relying on a few noisy signals. This reduces variance at the cost of introducing a small amount of bias.
Choosing \u03bb via Cross-Validation
The regularisation strength \u03bb is a hyperparameter that must be tuned. Using GridSearchCV or RidgeCV/LassoCV over a logarithmic grid of values is the standard approach. Never select \u03bb based on test-set performance — use held-out validation folds only.
Types of Regularisation
Different penalty norms produce models with different properties around coefficient shrinkage and sparsity.
L1, L2, and Elastic Net
L2 (Ridge) adds \u03a3\u03b2\u1d62\u00b2 — it shrinks all coefficients toward zero but keeps them all non-zero. L1 (Lasso) adds \u03a3|\u03b2\u1d62| — it can shrink coefficients exactly to zero, performing automatic feature selection. Elastic Net blends both penalties, gaining the benefits of each.