Coding a Basic Gradient Descent Loop
Gradient descent is the engine of every AI model. It finds the minimum of a function by repeatedly taking small steps in the direction of the steepest downhill slope. Writing it from scratch — with no library — makes the mechanism concrete before you use PyTorch or TensorFlow.
The Three-Line Algorithm
At every step of gradient descent you: (1) compute the current gradient, (2) multiply it by the learning rate, and (3) subtract it from the current parameter. Repeat until the gradient is near zero.
Minimising $f(x) = x^2$
Learning Rate: Too High, Too Low, Just Right
The learning rate $\alpha$ controls step size. If it's too large, the algorithm overshoots the minimum and may diverge. If it's too small, convergence is painfully slow. Finding the right value is one of the key skills in training AI models.
Comparing Learning Rates
Linear Regression via Gradient Descent
A more realistic example: fit a straight line $\hat{y} = wx + b$ to data by minimising MSE loss $L = \frac{1}{n}\sum(\hat{y}_i - y_i)^2$. The gradients for $w$ and $b$ are derived analytically, and we update both parameters simultaneously each step.