Regularization Overview

Regularisation is the technique of adding a penalty to the loss function that discourages large coefficients, effectively limiting a model's freedom to overfit training data.


The Regularisation Idea

Instead of minimising only the training error, a regularised model minimises Loss + \u03bb \u00d7 Penalty(\u03b2). The hyperparameter \u03bb controls the strength of regularisation: larger \u03bb shrinks coefficients more aggressively.

Why Regularisation Works

Overfitting produces large, erratic coefficient values. Penalising large coefficients forces the model to spread predictive power evenly across features instead of relying on a few noisy signals. This reduces variance at the cost of introducing a small amount of bias.

Choosing \u03bb via Cross-Validation

The regularisation strength \u03bb is a hyperparameter that must be tuned. Using GridSearchCV or RidgeCV/LassoCV over a logarithmic grid of values is the standard approach. Never select \u03bb based on test-set performance — use held-out validation folds only.

Types of Regularisation

Different penalty norms produce models with different properties around coefficient shrinkage and sparsity.

L1, L2, and Elastic Net

L2 (Ridge) adds \u03a3\u03b2\u1d62\u00b2 — it shrinks all coefficients toward zero but keeps them all non-zero. L1 (Lasso) adds \u03a3|\u03b2\u1d62| — it can shrink coefficients exactly to zero, performing automatic feature selection. Elastic Net blends both penalties, gaining the benefits of each.