Simple Linear Regression: Line of Best Fit
Simple linear regression is the most fundamental predictive model: it draws a straight line through your data to capture the relationship between a single feature and a continuous target.
The Linear Equation
The model assumes the target y is approximately a linear function of the feature x: y = \u03b2\u2080 + \u03b2\u2081x + \u03b5, where \u03b2\u2080 is the intercept, \u03b2\u2081 is the slope, and \u03b5 is irreducible noise.
Slope and Intercept
The slope (\u03b2\u2081) tells you how much y changes for a one-unit increase in x. The intercept (\u03b2\u2080) is the predicted value of y when x equals zero. Together they uniquely define the line.
Fitting with scikit-learn
Residuals and Goodness of Fit
Residuals are the vertical distances between actual data points and the fitted line. Analysing them reveals whether the linear assumption holds.
What Residuals Tell You
Ideally residuals are small, randomly distributed around zero, and show no pattern with respect to x. A funnel shape indicates heteroskedasticity (non-constant variance), and a curved pattern suggests the true relationship is non-linear.
R-Squared
R\u00b2 measures what fraction of the variance in y is explained by the model. An R\u00b2 of 1.0 means a perfect fit; 0.0 means the model does no better than predicting the mean. Use it as a quick sanity check, not the sole measure of quality.