Power Rule, Product Rule, and Quotient Rule
Calculating derivatives using the limit definition from scratch is mathematically rigorous but computationally exhausting. To make differentiation practical, calculus provides a set of algebraic shortcuts or rules. These rules allow us to quickly and systematically find the derivatives of complex functions by breaking them down into simpler components. These formulas form the algorithmic foundation of automatic differentiation libraries like PyTorch and TensorFlow.
The Core Differentiation Rules
Differentiating complex expressions is made possible by three primary rules: the Power Rule, the Product Rule, and the Quotient Rule. These rules apply to combinations of functions, allowing us to compute derivatives term by term.
The Power Rule
The Power Rule is used to differentiate variables raised to a constant exponent. It states that for any real number $n$:
$$\frac{d}{dx}[x^n] = n x^{n-1}$$
For example, the derivative of $x^3$ is $3x^2$, and the derivative of $x^2$ is $2x$.
The Product and Quotient Rules
The Product Rule is used when differentiating two functions multiplied together: $\frac{d}{dx}[u \cdot v] = u'v + uv'$. The Quotient Rule is used for one function divided by another: $\frac{d}{dx}[\frac{u}{v}] = \frac{u'v - uv'}{v^2}$, where $v \neq 0$.
Application in Neural Networks
Neural networks utilize these rules to compute derivatives for activations, loss functions, and weight updates. In particular, non-linear activation functions rely on these shortcuts for efficient backpropagation.
Differentiating Polynomial Transformations
Many standard models, such as polynomial regression or linear layers with regularization, rely directly on the Power Rule to calculate parameter derivatives during training.
Differentiating Activation Functions
Consider the Sigmoid activation function, $\sigma(x) = \frac{1}{1 + e^{-x}}$. Using the Quotient and Chain rules, we can derive its derivative as:
$$\sigma'(x) = \sigma(x)(1 - \sigma(x))$$
This incredibly simple, self-referential derivative is computed instantly by hardware during backpropagation.