Simple Linear Regression: Line of Best Fit

Simple linear regression is the most fundamental predictive model: it draws a straight line through your data to capture the relationship between a single feature and a continuous target.

The Linear Equation

The model assumes the target y is approximately a linear function of the feature x: y = \u03b2\u2080 + \u03b2\u2081x + \u03b5, where \u03b2\u2080 is the intercept, \u03b2\u2081 is the slope, and \u03b5 is irreducible noise.

Slope and Intercept

The slope (\u03b2\u2081) tells you how much y changes for a one-unit increase in x. The intercept (\u03b2\u2080) is the predicted value of y when x equals zero. Together they uniquely define the line.

Fitting with scikit-learn

<pre><code class="language-python">import numpy as np from sklearn.linear_model import LinearRegression import matplotlib.pyplot as plt rng = np.random.default_rng(0) X = rng.uniform(0, 10, (50, 1)) y = 2.5 * X.ravel() + 1.0 + rng.normal(0, 1, 50) model = LinearRegression().fit(X, y) print(f"Intercept: {model.intercept_:.2f}") print(f"Slope: {model.coef_[0]:.2f}")</pre>

Residuals and Goodness of Fit

Residuals are the vertical distances between actual data points and the fitted line. Analysing them reveals whether the linear assumption holds.

What Residuals Tell You

Ideally residuals are small, randomly distributed around zero, and show no pattern with respect to x. A funnel shape indicates heteroskedasticity (non-constant variance), and a curved pattern suggests the true relationship is non-linear.

R-Squared

R\u00b2 measures what fraction of the variance in y is explained by the model. An R\u00b2 of 1.0 means a perfect fit; 0.0 means the model does no better than predicting the mean. Use it as a quick sanity check, not the sole measure of quality.