Coding a Basic Gradient Descent Loop

Gradient descent is the engine of every AI model. It finds the minimum of a function by repeatedly taking small steps in the direction of the steepest downhill slope. Writing it from scratch — with no library — makes the mechanism concrete before you use PyTorch or TensorFlow.

The Three-Line Algorithm

At every step of gradient descent you: (1) compute the current gradient, (2) multiply it by the learning rate, and (3) subtract it from the current parameter. Repeat until the gradient is near zero.

Minimising $f(x) = x^2$

<pre><code class="language-python">x = 10.0 # starting point lr = 0.1 # learning rate (step size) history = [x] for step in range(50): gradient = 2 * x # f'(x) = 2x x = x - lr * gradient # take a step downhill history.append(x) print(f"Minimum found at x = {x:.6f}") # ≈ 0.0 </pre>

Learning Rate: Too High, Too Low, Just Right

The learning rate $\alpha$ controls step size. If it's too large, the algorithm overshoots the minimum and may diverge. If it's too small, convergence is painfully slow. Finding the right value is one of the key skills in training AI models.

Comparing Learning Rates

<pre><code class="language-python">def gradient_descent(lr, steps=30): x = 10.0 for _ in range(steps): x = x - lr * (2 * x) return x print(gradient_descent(lr=0.01)) # 7.37 — too slow print(gradient_descent(lr=0.10)) # 0.00 — just right print(gradient_descent(lr=1.10)) # diverges to ±inf </pre>

Linear Regression via Gradient Descent

A more realistic example: fit a straight line $\hat{y} = wx + b$ to data by minimising MSE loss $L = \frac{1}{n}\sum(\hat{y}_i - y_i)^2$. The gradients for $w$ and $b$ are derived analytically, and we update both parameters simultaneously each step.

Training a Linear Model

<pre><code class="language-python">import numpy as np # Ground truth: y = 2x + 1 with noise X = np.linspace(0, 5, 50) Y = 2 * X + 1 + np.random.normal(0, 0.5, 50) w, b = 0.0, 0.0 # initialise lr = 0.01 for _ in range(1000): y_pred = w * X + b error = y_pred - Y # MSE gradients dw = (2 / len(X)) * np.dot(error, X) db = (2 / len(X)) * np.sum(error) w -= lr * dw b -= lr * db print(f"w={w:.3f}, b={b:.3f}") # ≈ w=2.0, b=1.0 </pre>