Logistic Regression: Predicting Probabilities
Despite its name, logistic regression is a classification model that outputs the probability of belonging to a class rather than a raw numeric prediction.
From Linear Output to Probabilities
Logistic regression takes a linear combination of features and passes it through the sigmoid function to squash the output to the (0, 1) range, turning it into a probability.
The Sigmoid Transformation
P(y=1|x) = \u03c3(\u03b2\u2080 + \u03b2\u2081x\u2081 + ... + \u03b2\u2099x\u2099) where \u03c3(z) = 1 / (1 + e\u207b\u1d63). Values near 0 and 1 indicate high confidence; values near 0.5 indicate uncertainty.
Fitting Logistic Regression
The Log-Loss Objective
Logistic regression is trained by maximising the log-likelihood (equivalently minimising binary cross-entropy loss), which strongly penalises confident wrong predictions.
Why Cross-Entropy Instead of MSE?
MSE applied to probabilities produces a non-convex loss surface with many local minima. Cross-entropy (log-loss) is convex, so gradient descent always converges to the global minimum. It also has the natural probabilistic interpretation of negative log-likelihood under a Bernoulli model.