Evaluating Regression: Mean Squared Error (MSE) and R-Squared

MSE and R\u00b2 are the two most common regression evaluation metrics — MSE quantifies average squared error while R\u00b2 expresses how much variance in the target the model explains.

Mean Squared Error (MSE) and RMSE

MSE = (1/n) \u03a3(y\u1d62 - \u0177\u1d62)\u00b2. Taking the square root gives RMSE, which is in the same units as the target. MSE's squaring amplifies large errors, making it useful when big mistakes are especially costly.

Computing MSE and RMSE

<pre><code class="language-python">from sklearn.metrics import mean_squared_error import numpy as np y_true = np.array([3.0, 2.5, 4.0, 5.0, 4.5]) y_pred = np.array([2.8, 2.9, 3.7, 5.2, 4.1]) mse = mean_squared_error(y_true, y_pred) rmse = np.sqrt(mse) print(f"MSE: {mse:.4f}") print(f"RMSE: {rmse:.4f}")</pre>

R-Squared (Coefficient of Determination)

R\u00b2 = 1 - (SS_res / SS_tot), where SS_res is the sum of squared residuals and SS_tot is the total variance in y. R\u00b2 = 1 is a perfect fit; R\u00b2 = 0 means the model is no better than predicting the mean.

Computing R-Squared

<pre><code class="language-python">from sklearn.metrics import r2_score r2 = r2_score(y_true, y_pred) print(f"R\u00b2: {r2:.4f}")</pre>

Adjusted R-Squared

R\u00b2 always increases when you add features, even useless ones. Adjusted R\u00b2 penalises the number of predictors: Adj.R\u00b2 = 1 - (1 - R\u00b2)(n-1)/(n-p-1), where n is sample count and p is the number of features. Prefer adjusted R\u00b2 when comparing models with different numbers of features.