Matrix Multiplication in Python (np.dot vs @)
Every forward pass in a neural network is a sequence of matrix multiplications. Getting the syntax wrong — using * instead of @ — produces no error but completely wrong results. This topic makes the distinction crystal clear.
Element-wise vs. Matrix Multiplication
Element-wise (*): multiplies corresponding elements. Requires both arrays to have the same shape. This is not what neural networks do.
Matrix multiplication (@ or np.dot): the standard linear algebra operation $C_{ij} = \sum_k A_{ik} B_{kj}$. Requires the inner dimensions to match: (m, k) @ (k, n) → (m, n).
The Difference in Code
Shape Rules for Matrix Multiplication
The inner dimensions must match. If $A$ is shape (m, k) and $B$ is shape (k, n), the result is shape (m, n). The k dimension is consumed; m and n survive.
Checking Shapes
The Neural Network Forward Pass
A single dense layer computes $\mathbf{y} = \mathbf{X} \mathbf{W} + \mathbf{b}$ where $\mathbf{X}$ is the input batch, $\mathbf{W}$ are the weights, and $\mathbf{b}$ is the bias. This entire operation is one matrix multiply and one broadcast addition.