Matrix Multiplication in Python (np.dot vs @)

Every forward pass in a neural network is a sequence of matrix multiplications. Getting the syntax wrong — using * instead of @ — produces no error but completely wrong results. This topic makes the distinction crystal clear.


Element-wise vs. Matrix Multiplication

Element-wise (*): multiplies corresponding elements. Requires both arrays to have the same shape. This is not what neural networks do.

Matrix multiplication (@ or np.dot): the standard linear algebra operation $C_{ij} = \sum_k A_{ik} B_{kj}$. Requires the inner dimensions to match: (m, k) @ (k, n) → (m, n).

The Difference in Code

<pre><code class="language-python">import numpy as np A = np.array([[1, 2], [3, 4]]) B = np.array([[5, 6], [7, 8]]) # Element-wise multiplication (Hadamard product) hadamard = A * B # [[ 5, 12], # [21, 32]] # True matrix multiplication matmul = A @ B # or np.dot(A, B) # [[19, 22], # [43, 50]] </pre>

Shape Rules for Matrix Multiplication

The inner dimensions must match. If $A$ is shape (m, k) and $B$ is shape (k, n), the result is shape (m, n). The k dimension is consumed; m and n survive.

Checking Shapes

<pre><code class="language-python"># (3, 4) @ (4, 2) → (3, 2) ✅ X = np.random.rand(3, 4) W = np.random.rand(4, 2) print((X @ W).shape) # (3, 2) # (3, 4) @ (3, 2) → ERROR ❌ inner dims 4 ≠ 3 </pre>

The Neural Network Forward Pass

A single dense layer computes $\mathbf{y} = \mathbf{X} \mathbf{W} + \mathbf{b}$ where $\mathbf{X}$ is the input batch, $\mathbf{W}$ are the weights, and $\mathbf{b}$ is the bias. This entire operation is one matrix multiply and one broadcast addition.

One Dense Layer from Scratch

<pre><code class="language-python"># Batch of 5 samples, 3 input features X = np.random.rand(5, 3) # Weight matrix: 3 inputs → 4 outputs W = np.random.rand(3, 4) b = np.zeros(4) # bias for 4 outputs # Forward pass output = X @ W + b # shape (5, 4) print(output.shape) # (5, 4) </pre>