Choosing the Right K in KNN
The single most important hyperparameter in KNN is K — too small and the model overfits to noise; too large and it smooths away the decision boundary entirely.
K and the Bias-Variance Tradeoff
A small K (e.g., K=1) creates a very flexible, jagged decision boundary that fits noise — high variance. A large K averages over many neighbours, producing a smoother boundary — high bias. Optimal K sits between these extremes.
K=1: Perfect Training Accuracy, Poor Generalisation
With K=1 the model always predicts the label of the single nearest training point, achieving 100% training accuracy. But it learns every noise artefact in the data, leading to poor performance on unseen examples.
Selecting K via Cross-Validation
Systematically evaluate a range of K values using cross-validation and select the K with the best validation score.
Grid Search for K
Practical Guidelines
A common starting heuristic is K = \u221an where n is the training set size. Always use odd K for binary classification to avoid tie votes. Cross-validation on a dedicated validation set (not the test set) should be the final arbiter.