Power Rule, Product Rule, and Quotient Rule

Calculating derivatives using the limit definition from scratch is mathematically rigorous but computationally exhausting. To make differentiation practical, calculus provides a set of algebraic shortcuts or rules. These rules allow us to quickly and systematically find the derivatives of complex functions by breaking them down into simpler components. These formulas form the algorithmic foundation of automatic differentiation libraries like PyTorch and TensorFlow.

The Core Differentiation Rules

Differentiating complex expressions is made possible by three primary rules: the Power Rule, the Product Rule, and the Quotient Rule. These rules apply to combinations of functions, allowing us to compute derivatives term by term.

The Power Rule

The Power Rule is used to differentiate variables raised to a constant exponent. It states that for any real number $n$: $$\frac{d}{dx}[x^n] = n x^{n-1}$$ For example, the derivative of $x^3$ is $3x^2$, and the derivative of $x^2$ is $2x$.

The Product and Quotient Rules

The Product Rule is used when differentiating two functions multiplied together: $\frac{d}{dx}[u \cdot v] = u'v + uv'$. The Quotient Rule is used for one function divided by another: $\frac{d}{dx}[\frac{u}{v}] = \frac{u'v - uv'}{v^2}$, where $v \neq 0$.

Application in Neural Networks

Neural networks utilize these rules to compute derivatives for activations, loss functions, and weight updates. In particular, non-linear activation functions rely on these shortcuts for efficient backpropagation.

Differentiating Polynomial Transformations

Many standard models, such as polynomial regression or linear layers with regularization, rely directly on the Power Rule to calculate parameter derivatives during training.

Differentiating Activation Functions

Consider the Sigmoid activation function, $\sigma(x) = \frac{1}{1 + e^{-x}}$. Using the Quotient and Chain rules, we can derive its derivative as: $$\sigma'(x) = \sigma(x)(1 - \sigma(x))$$ This incredibly simple, self-referential derivative is computed instantly by hardware during backpropagation.