Random Forests: Out-of-Bag (OOB) Error
Because each tree is trained on a bootstrap sample, ~37% of the data is left out per tree — these out-of-bag samples serve as a built-in validation set at no extra cost.
How OOB Estimation Works
For each training sample, predictions are collected only from trees that did not train on that sample. The OOB error is the aggregated error across all such predictions — essentially equivalent to leave-one-out cross-validation for large forests.
Enabling OOB Scoring
OOB vs. Cross-Validation
OOB error is computed for free during training, making it ideal for quick model assessment. Cross-validation is more reliable for small datasets but requires multiple full training runs.
When to Prefer OOB
Use OOB when training data is large (making CV expensive) or when you want a fast, preliminary estimate of generalization. For final model selection, confirm OOB results with k-fold CV.
Convergence with n_estimators
Interpreting OOB Decision Function
rf.oob_decision_function_ gives per-sample class probabilities estimated from OOB predictions, useful for calibration, threshold analysis, and identifying hard-to-classify samples.
Spotting Difficult Samples
Samples with OOB probability close to 0.5 (binary case) are near the decision boundary and may benefit from data collection, feature engineering, or dedicated analysis of labeling consistency.