Calculating Derivatives Programmatically
Backpropagation requires computing derivatives of a loss function with respect to every parameter. In theory, you derive these by hand. In practice, you use two approaches: numerical differentiation (approximating with finite differences) for debugging, and automatic differentiation (used by PyTorch/JAX) for production.
Numerical Differentiation (Finite Differences)
The derivative definition is $f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}$. With a very small $h$ (like $10^{-7}$), we can approximate this numerically. This is useful for gradient checking — verifying your analytical gradient is correct.
A Gradient Checker
The central difference formula (using $x+h$ and $x-h$) is more accurate than the one-sided formula because its error is $O(h^2)$ instead of $O(h)$.
Computing Gradients for Multivariable Functions
Most AI functions take a vector of parameters. We compute the gradient by calculating the partial derivative for each parameter while holding the rest constant.
Vector Gradient
Automatic Differentiation with PyTorch
Numerical differentiation is slow — it needs $2n$ function calls for $n$ parameters. Automatic differentiation (autograd) computes exact gradients in a single backward pass by tracking operations in a computation graph. This is what makes training billion-parameter models feasible.