K-Nearest Neighbors (KNN) for Classification
K-Nearest Neighbors (KNN) is a simple, non-parametric classifier that makes predictions purely based on the labels of the most similar training examples.
How KNN Works
To classify a new point, KNN computes the distance to every training point, identifies the K nearest ones, and assigns the class by majority vote. There is no explicit training step — the algorithm memorises the entire dataset.
KNN in scikit-learn
Distance Metrics and Feature Scaling
KNN is inherently distance-based, so the choice of distance metric and feature scaling are critical to performance.
Why Scaling Matters
Features with large numeric ranges dominate distance calculations. For example, if one feature ranges 0–1000 and another 0–1, the first feature almost entirely determines the nearest neighbours. Always scale features (e.g., StandardScaler) before using KNN.
Common Distance Metrics
Euclidean distance (L2) is the default and works well for continuous features. Manhattan distance (L1) is more robust to outliers. Minkowski distance generalises both. For categorical or binary features, Hamming distance is more appropriate.