Linear Discriminant Analysis (LDA)

Linear Discriminant Analysis (LDA) finds linear projections that maximize the ratio of between-class to within-class scatter, making it simultaneously a classifier and a supervised dimensionality reduction technique.

LDA Objective

LDA maximizes J(W) = |W^T S_B W| / |W^T S_W W|, where S_B is the between-class scatter matrix and S_W is the within-class scatter matrix. The optimal projection W is given by the top eigenvectors of S_W^{-1} S_B.

Key Assumptions

Features follow a multivariate Gaussian distribution within each class.
All classes share the same covariance matrix (homoscedasticity).
LDA can extract at most C-1 discriminant components (where C is the number of classes).

LDA in scikit-learn

LinearDiscriminantAnalysis in sklearn supports both classification (predict) and dimensionality reduction (transform).

Classification with LDA

<pre><code class="language-python">from sklearn.discriminant_analysis import LinearDiscriminantAnalysis from sklearn.datasets import load_iris from sklearn.model_selection import cross_val_score X, y = load_iris(return_X_y=True) lda = LinearDiscriminantAnalysis() scores = cross_val_score(lda, X, y, cv=5) print(f"LDA CV Accuracy: {scores.mean():.3f} +/- {scores.std():.3f}")</pre>

Dimensionality Reduction with LDA

<pre><code class="language-python">import matplotlib.pyplot as plt lda.fit(X, y) X_lda = lda.transform(X) # Project onto (C-1)=2 discriminant axes plt.figure(figsize=(8, 5)) for cls, name in enumerate(load_iris().target_names): mask = y == cls plt.scatter(X_lda[mask, 0], X_lda[mask, 1], label=name, alpha=0.7) plt.xlabel('LD1'); plt.ylabel('LD2') plt.title('LDA Projection of Iris') plt.legend(); plt.show() print(f"Explained variance ratio: {lda.explained_variance_ratio_}")</pre>

LDA vs. PCA

PCA is unsupervised and maximizes total variance; LDA is supervised and maximizes class separability. When class labels are available, LDA typically produces better low-dimensional representations for classification.

When to Use LDA

Use LDA when class labels are available and you want to reduce dimensionality for classification. It is particularly powerful when classes are well-separated Gaussians with similar covariances. For non-Gaussian data or very different within-class covariances, Quadratic Discriminant Analysis (QDA) may be more appropriate.