The Ordinary Least Squares (OLS) Method

Ordinary Least Squares (OLS) is the mathematical engine behind linear regression — it finds the unique line (or hyperplane) that minimises the total squared prediction error.

The OLS Objective

OLS minimises the Sum of Squared Residuals (SSR): SSR = \u03a3(y\u1d62 - \u0177\u1d62)\u00b2. Squaring the residuals penalises large errors more than small ones and ensures a unique, analytically tractable solution.

Why Squared Errors?

Squaring residuals has two benefits: it removes the sign (so positive and negative errors don't cancel) and it convexly weights larger errors more heavily. This makes OLS sensitive to outliers — a known limitation you must manage with robust regression or regularisation when outliers are present.

The Normal Equations

For a design matrix X and target vector y, the closed-form OLS solution is \u03b2 = (X\u1d40X)\u207b\u00b9X\u1d40y. scikit-learn uses numerically stable solvers (e.g., SVD) instead of directly inverting X\u1d40X, but the result is equivalent.

<pre><code class="language-python">import numpy as np # Manual OLS via normal equations X_b = np.c_[np.ones((50, 1)), np.random.rand(50, 1)] # add bias column y = 3 * X_b[:, 1] + 2 + np.random.randn(50) * 0.5 beta = np.linalg.pinv(X_b.T @ X_b) @ X_b.T @ y print(f"Intercept: {beta[0]:.2f}, Slope: {beta[1]:.2f}")</pre>

Assumptions of OLS

OLS produces the Best Linear Unbiased Estimator (BLUE) only when its assumptions hold — violations degrade inference even if predictions look acceptable.

Key Gauss-Markov Assumptions

Linearity: y is a linear function of X.
Independence: Observations are independent of each other.
Homoskedasticity: Residual variance is constant across all values of X.
No multicollinearity: Features are not perfectly correlated with each other.
Zero mean residuals: The model does not systematically over- or under-predict.