Multiple Linear Regression Models

Multiple linear regression generalises the single-feature case to any number of input variables, enabling predictions that depend on many factors simultaneously.


Model Structure

The model is y = \u03b2\u2080 + \u03b2\u2081x\u2081 + \u03b2\u2082x\u2082 + ... + \u03b2\u2099x\u2099 + \u03b5. Each coefficient \u03b2\u1d62 represents the change in y for a one-unit increase in x\u1d62, holding all other features constant.

Fitting with scikit-learn

<pre><code class="language-python">from sklearn.linear_model import LinearRegression from sklearn.datasets import fetch_california_housing from sklearn.model_selection import train_test_split from sklearn.metrics import r2_score data = fetch_california_housing() X_train, X_test, y_train, y_test = train_test_split( data.data, data.target, test_size=0.2, random_state=42 ) model = LinearRegression().fit(X_train, y_train) y_pred = model.predict(X_test) print(f"R\u00b2: {r2_score(y_test, y_pred):.3f}") for name, coef in zip(data.feature_names, model.coef_): print(f" {name}: {coef:.4f}")</pre>

Interpreting Coefficients

Because features typically have different scales, raw coefficient magnitudes are not directly comparable. Standardise your features first (zero mean, unit variance) to make coefficients comparable as measures of relative importance. After standardisation, larger absolute coefficients correspond to stronger influence on the target.

Multicollinearity

When two or more features are highly correlated, coefficient estimates become unstable and hard to interpret.

Detecting Multicollinearity with VIF

The Variance Inflation Factor (VIF) quantifies how much a coefficient's variance is inflated by collinearity. A VIF above 10 is a common warning threshold. Remedies include removing redundant features, PCA, or switching to Ridge regression which handles collinearity gracefully.