Logistic Regression: Predicting Probabilities

Despite its name, logistic regression is a classification model that outputs the probability of belonging to a class rather than a raw numeric prediction.

From Linear Output to Probabilities

Logistic regression takes a linear combination of features and passes it through the sigmoid function to squash the output to the (0, 1) range, turning it into a probability.

The Sigmoid Transformation

P(y=1|x) = \u03c3(\u03b2\u2080 + \u03b2\u2081x\u2081 + ... + \u03b2\u2099x\u2099) where \u03c3(z) = 1 / (1 + e\u207b\u1d63). Values near 0 and 1 indicate high confidence; values near 0.5 indicate uncertainty.

Fitting Logistic Regression

<pre><code class="language-python">from sklearn.linear_model import LogisticRegression from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.pipeline import make_pipeline X, y = load_breast_cancer(return_X_y=True) X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.2, random_state=42) clf = make_pipeline(StandardScaler(), LogisticRegression(max_iter=1000)) clf.fit(X_tr, y_tr) # Predicted probabilities for both classes probs = clf.predict_proba(X_te) print("P(malignant | X_test[0]):", probs[0, 1].round(4)) print("Accuracy:", clf.score(X_te, y_te).round(4))</pre>

The Log-Loss Objective

Logistic regression is trained by maximising the log-likelihood (equivalently minimising binary cross-entropy loss), which strongly penalises confident wrong predictions.

Why Cross-Entropy Instead of MSE?

MSE applied to probabilities produces a non-convex loss surface with many local minima. Cross-entropy (log-loss) is convex, so gradient descent always converges to the global minimum. It also has the natural probabilistic interpretation of negative log-likelihood under a Bernoulli model.