Cross-Entropy Math vs. Code
Cross-entropy is the standard loss function for classification. Written mathematically it looks clean: $L = -\sum_c y_c \log \hat{y}_c$. But in code, naïvely implementing this formula causes numerical errors. This topic shows you the math, the naive implementation, and the numerically stable version used in production frameworks.
The Formula and What It Measures
For a single sample with true one-hot label $y$ and predicted probability vector $\hat{y}$ (output by Softmax), the loss is: $$L = -\sum_{c=1}^{C} y_c \log \hat{y}_c$$
Because $y$ is one-hot (all zeros except one 1), this collapses to: $L = -\log \hat{y}_{\text{true}}$. The loss is low when the model assigns high probability to the correct class, and high when it assigns low probability.
Naive NumPy Implementation
The Numerical Stability Problem
If the model is very wrong, $\hat{y}_{\text{true}}$ can be nearly 0. Then $\log(0) = -\infty$, causing NaN values that poison training. The standard fix is log-sum-exp stabilization: instead of computing Softmax then log, combine them into a single stable operation called log-softmax.
Stable Cross-Entropy
Cross-Entropy in PyTorch
PyTorch's nn.CrossEntropyLoss handles numerical stability automatically — it expects raw logits (before softmax), not probabilities. Passing probabilities into it is a common mistake that produces incorrect gradients.