Calculating Eigenvalues using SciPy

Eigenvalues reveal the "natural axes" of a matrix — the directions in which it stretches data most. This is the mathematical foundation of PCA (dimensionality reduction), spectral clustering, and understanding gradient scaling in deep networks. NumPy and SciPy make computing them trivial.


What Eigenvalues Tell You

For a square matrix $A$, an eigenvector $\mathbf{v}$ is a special direction where $A\mathbf{v} = \lambda \mathbf{v}$. The matrix only scales the vector by $\lambda$ (the eigenvalue) without rotating it.

Large eigenvalues correspond to directions of large variance in the data. PCA finds these directions and projects data onto them to reduce dimensions while keeping maximum information.

Computing Eigenvalues with NumPy

<pre><code class="language-python">import numpy as np A = np.array([[4, 2], [1, 3]]) eigenvalues, eigenvectors = np.linalg.eig(A) print("Eigenvalues:", eigenvalues) # [5. 2.] print("Eigenvectors (columns):\n", eigenvectors) # Each column is an eigenvector </pre>

Using SciPy for Symmetric Matrices

Covariance matrices are always symmetric. For symmetric matrices, scipy.linalg.eigh() is preferred over np.linalg.eig() — it returns real eigenvalues (no complex artifacts) and is faster because it exploits symmetry.

Eigendecomposition of a Covariance Matrix

<pre><code class="language-python">from scipy import linalg import numpy as np # Create some random data data = np.random.randn(100, 4) # Compute the covariance matrix (4x4, symmetric) cov = np.cov(data.T) # Use eigh for symmetric matrices (more stable) values, vectors = linalg.eigh(cov) # Eigenvalues in ascending order — flip for descending values = values[::-1] vectors = vectors[:, ::-1] print("Top eigenvalue:", values[0]) </pre>

PCA from Scratch Using Eigenvalues

PCA is simply: compute the covariance matrix → find its eigenvectors → project data onto the top-$k$ eigenvectors. The eigenvectors corresponding to the largest eigenvalues are the principal components.

Manual PCA in NumPy

<pre><code class="language-python">data = np.random.randn(200, 5) # 200 samples, 5 features # Centre the data data_centred = data - data.mean(axis=0) # Covariance matrix cov = np.cov(data_centred.T) # Eigendecomposition values, vectors = np.linalg.eigh(cov) # Select top 2 principal components (largest eigenvalues) top2 = vectors[:, -2:] # shape (5, 2) projected = data_centred @ top2 # shape (200, 2) print("Reduced shape:", projected.shape) </pre>