Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA) finds linear projections that maximize the ratio of between-class to within-class scatter, making it simultaneously a classifier and a supervised dimensionality reduction technique.
LDA Objective
LDA maximizes J(W) = |W^T S_B W| / |W^T S_W W|, where S_B is the between-class scatter matrix and S_W is the within-class scatter matrix. The optimal projection W is given by the top eigenvectors of S_W^{-1} S_B.
Key Assumptions
- Features follow a multivariate Gaussian distribution within each class.
- All classes share the same covariance matrix (homoscedasticity).
- LDA can extract at most C-1 discriminant components (where C is the number of classes).
LDA in scikit-learn
LinearDiscriminantAnalysis in sklearn supports both classification (predict) and dimensionality reduction (transform).
Classification with LDA
Dimensionality Reduction with LDA
LDA vs. PCA
PCA is unsupervised and maximizes total variance; LDA is supervised and maximizes class separability. When class labels are available, LDA typically produces better low-dimensional representations for classification.
When to Use LDA
Use LDA when class labels are available and you want to reduce dimensionality for classification. It is particularly powerful when classes are well-separated Gaussians with similar covariances. For non-Gaussian data or very different within-class covariances, Quadratic Discriminant Analysis (QDA) may be more appropriate.