Explained Variance Ratio
Each principal component accounts for a fraction of total variance equal to its eigenvalue divided by the sum of all eigenvalues. Components are ordered by decreasing explained variance.
Scree Plot
<pre><code class="language-python">from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_breast_cancer
import matplotlib.pyplot as plt
import numpy as np
X, _ = load_breast_cancer(return_X_y=True)
X_scaled = StandardScaler().fit_transform(X)
pca_full = PCA().fit(X_scaled)
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
# Scree plot: per-component variance
axes[0].plot(pca_full.explained_variance_ratio_, 'bo-')
axes[0].set_xlabel('Component'); axes[0].set_ylabel('Explained Variance Ratio')
axes[0].set_title('Scree Plot')
# Cumulative variance
cum_var = np.cumsum(pca_full.explained_variance_ratio_)
axes[1].plot(cum_var, 'ro-')
axes[1].axhline(0.95, color='gray', linestyle='--', label='95%')
axes[1].set_xlabel('Number of Components')
axes[1].set_ylabel('Cumulative Explained Variance')
axes[1].set_title('Cumulative Variance')
axes[1].legend()
plt.tight_layout(); plt.show()
print(f"Components for 95%: {np.searchsorted(cum_var, 0.95) + 1}")</pre>
Interpreting Loadings
PCA component loadings (pca.components_) reveal which original features contribute most to each principal component, aiding interpretation.
Loading Analysis
<pre><code class="language-python">import pandas as pd
pca_2 = PCA(n_components=2).fit(X_scaled)
feature_names = load_breast_cancer().feature_names
loadings = pd.DataFrame(pca_2.components_.T,
index=feature_names,
columns=['PC1', 'PC2'])
# Top contributors to PC1
print(loadings['PC1'].abs().sort_values(ascending=False).head(5))</pre>
Biplot: Scores and Loadings Together
A biplot overlays sample scores (projected points) and feature loading vectors on the same 2D PCA plot. Features pointing in similar directions are positively correlated; opposite directions indicate negative correlation. This enables joint interpretation of samples and features.
Information Loss and Reconstruction
Retaining only k components means discarding variance in the remaining components. The discarded variance represents information loss, quantifiable as 1 - \u03a3_{i=1}^{k} \u03bb_i / \u03a3 \u03bb_i.
Reconstruction Error
<pre><code class="language-python">pca_k = PCA(n_components=5)
X_reduced = pca_k.fit_transform(X_scaled)
X_reconstructed = pca_k.inverse_transform(X_reduced)
import numpy as np
reconstruction_error = np.mean((X_scaled - X_reconstructed) ** 2)
print(f"MSE Reconstruction Error (k=5): {reconstruction_error:.4f}")
print(f"Variance retained: {pca_k.explained_variance_ratio_.sum():.3f}")</pre>