Advanced Imputation: KNN and Regression
When missing values have complex relationships with other features, simple mean/median imputation leaves predictive signal on the table. KNN and regression imputation use the structure of the data itself to generate smarter estimates.
KNN Imputation
KNN imputation finds the k most similar rows (by Euclidean distance on non-missing features) and fills in the missing value using their average. It captures local patterns but is computationally expensive for large datasets.
Using KNNImputer
Iterative (Regression) Imputation
Sklearn's IterativeImputer models each feature with missing values as a function of all other features, iteratively refining estimates. It is the Python equivalent of MICE (Multiple Imputation by Chained Equations).
Using IterativeImputer
Choosing Between KNN and Iterative
Use KNN when your dataset is small-to-medium and local structure matters (e.g., geospatial data). Use IterativeImputer when features have complex, global relationships and you can afford longer compute time. Both require fitting on training data only.