Lasso Regression (L1 Penalty) and Feature Selection
Lasso regression uses an L1 penalty that has the remarkable property of shrinking some coefficients to exactly zero, automatically selecting the most important features.
The L1 Penalty and Sparsity
Lasso minimises: SSR + \u03b1 \u03a3|\u03b2\u1d62|. The absolute-value geometry of the L1 ball has corners at the coordinate axes — solutions are pushed into these corners, zeroing out non-essential coefficients.
Fitting Lasso in scikit-learn
Limitations of Lasso
Lasso tends to arbitrarily select one feature from a group of correlated features and discard the rest, which can be misleading when correlated features are jointly important.
Lasso vs. Ridge: When to Choose Which
Use Lasso when you believe only a few features truly matter and want an interpretable, sparse model. Use Ridge when many features contribute and correlated features should be kept together. Use Elastic Net when you want sparsity but also want correlated features to be selected as a group.