Cross-Entropy Math vs. Code

Cross-entropy is the standard loss function for classification. Written mathematically it looks clean: $L = -\sum_c y_c \log \hat{y}_c$. But in code, naïvely implementing this formula causes numerical errors. This topic shows you the math, the naive implementation, and the numerically stable version used in production frameworks.


The Formula and What It Measures

For a single sample with true one-hot label $y$ and predicted probability vector $\hat{y}$ (output by Softmax), the loss is: $$L = -\sum_{c=1}^{C} y_c \log \hat{y}_c$$

Because $y$ is one-hot (all zeros except one 1), this collapses to: $L = -\log \hat{y}_{\text{true}}$. The loss is low when the model assigns high probability to the correct class, and high when it assigns low probability.

Naive NumPy Implementation

<pre><code class="language-python">import numpy as np y_true = np.array([0, 1, 0]) # true class is index 1 y_pred = np.array([0.1, 0.8, 0.1]) # model's prediction # Full sum (works because only one y_c is 1) loss = -np.sum(y_true * np.log(y_pred)) print(loss) # 0.2231 (−log(0.8)) # Shortcut: index directly into the correct class loss_fast = -np.log(y_pred[1]) print(loss_fast) # 0.2231 </pre>

The Numerical Stability Problem

If the model is very wrong, $\hat{y}_{\text{true}}$ can be nearly 0. Then $\log(0) = -\infty$, causing NaN values that poison training. The standard fix is log-sum-exp stabilization: instead of computing Softmax then log, combine them into a single stable operation called log-softmax.

Stable Cross-Entropy

<pre><code class="language-python">def stable_cross_entropy(logits, y_true): """ logits: raw model output (before softmax) y_true: integer class index """ # Subtract max for numerical stability shifted = logits - np.max(logits) log_sum_exp = np.log(np.sum(np.exp(shifted))) log_softmax = shifted - log_sum_exp return -log_softmax[y_true] logits = np.array([2.0, 4.0, 1.0]) # raw scores print(stable_cross_entropy(logits, y_true=1)) # loss for class 1 </pre>

Cross-Entropy in PyTorch

PyTorch's nn.CrossEntropyLoss handles numerical stability automatically — it expects raw logits (before softmax), not probabilities. Passing probabilities into it is a common mistake that produces incorrect gradients.

PyTorch Usage

<pre><code class="language-python">import torch import torch.nn as nn loss_fn = nn.CrossEntropyLoss() # Logits (raw, unnormalized scores) — NOT probabilities logits = torch.tensor([[2.0, 4.0, 1.0]]) # shape (1, 3) target = torch.tensor([1]) # true class index loss = loss_fn(logits, target) print(loss.item()) # 0.1429 # ❌ Common mistake: passing softmax output # probs = torch.softmax(logits, dim=1) # loss_fn(probs, target) ← wrong, gives bad gradients </pre>