P e x c e r a

Advanced Imputation: KNN and Regression

When missing values have complex relationships with other features, simple mean/median imputation leaves predictive signal on the table. KNN and regression imputation use the structure of the data itself to generate smarter estimates.


KNN Imputation

KNN imputation finds the k most similar rows (by Euclidean distance on non-missing features) and fills in the missing value using their average. It captures local patterns but is computationally expensive for large datasets.

Using KNNImputer

<pre><code class="language-python">from sklearn.impute import KNNImputer imputer = KNNImputer(n_neighbors=5) X_train_imputed = imputer.fit_transform(X_train) X_test_imputed = imputer.transform(X_test) # n_neighbors is a hyperparameter — try 3, 5, 10</pre>

Iterative (Regression) Imputation

Sklearn's IterativeImputer models each feature with missing values as a function of all other features, iteratively refining estimates. It is the Python equivalent of MICE (Multiple Imputation by Chained Equations).

Using IterativeImputer

<pre><code class="language-python">from sklearn.experimental import enable_iterative_imputer from sklearn.impute import IterativeImputer from sklearn.ensemble import RandomForestRegressor imputer = IterativeImputer( estimator=RandomForestRegressor(n_estimators=10, random_state=0), max_iter=10, random_state=0 ) X_train_imputed = imputer.fit_transform(X_train) X_test_imputed = imputer.transform(X_test)</pre>

Choosing Between KNN and Iterative

Use KNN when your dataset is small-to-medium and local structure matters (e.g., geospatial data). Use IterativeImputer when features have complex, global relationships and you can afford longer compute time. Both require fitting on training data only.