Handling Categorical Data: Label Encoding
Most ML algorithms require numeric input, so categorical text columns must be encoded as numbers. Label encoding maps each unique category to an integer — a simple approach that works well for ordinal data but can mislead models into assuming an order that doesn't exist.
LabelEncoder for Target Variables
sklearn's LabelEncoder is designed primarily for encoding the target variable (y). It maps classes to integers 0, 1, 2, … alphabetically or by first appearance.
Using LabelEncoder
OrdinalEncoder for Features
OrdinalEncoder is the correct tool for encoding ordinal feature columns (e.g., Education: high school < bachelor < master < PhD). It handles multiple columns at once and lets you specify a custom category order.
Encoding Ordinal Features
When NOT to Use Label Encoding
For nominal categories (colors, cities, product names) with no inherent order, label encoding imposes a false numerical relationship — the model may infer that "Paris" (1) is closer to "London" (2) than to "Tokyo" (0). Use One-Hot Encoding instead for such features.