Ridge Regression (L2 Penalty)
Ridge regression adds the sum of squared coefficients to the OLS loss, shrinking all coefficients toward — but never exactly to — zero to reduce overfitting.
Ridge Loss Function
Ridge minimises: SSR + \u03b1 \u03a3\u03b2\u1d62\u00b2, where \u03b1 (alpha) is the regularisation strength. The L2 penalty keeps all features in the model, making Ridge ideal when many features contribute weak signals.
Fitting Ridge in scikit-learn
When to Use Ridge
Ridge excels when you believe most features are genuinely informative but their signals are noisy or correlated.
Ridge vs. OLS on Collinear Data
When features are highly correlated, (X\u1d40X)\u207b\u00b9 becomes numerically unstable. Ridge adds \u03b1I to the diagonal of X\u1d40X before inversion — (X\u1d40X + \u03b1I)\u207b\u00b9 — which is always well-conditioned. This is why Ridge is the go-to solution for multicollinearity.