Pruning Decision Trees to Prevent Overfitting
Unpruned decision trees memorize training data perfectly but generalize poorly; pruning trims the tree to balance bias and variance.
Pre-Pruning (Early Stopping)
Pre-pruning stops tree growth early by enforcing constraints like maximum depth, minimum samples per split, or minimum samples per leaf before overfitting occurs.
Key Hyperparameters
Post-Pruning: Cost-Complexity Pruning
Cost-complexity pruning (also called minimal cost-complexity pruning or CCP) grows a full tree then prunes branches by minimizing R_alpha(T) = R(T) + alpha * |T|, where |T| is the number of leaves.
Finding the Best Alpha
Selecting the Optimal Alpha
Choose the ccp_alpha where the test accuracy peaks. This can be wrapped inside GridSearchCV for robust selection across cross-validation folds.
Bias-Variance Trade-off in Pruning
Aggressive pruning increases bias (underfitting) while no pruning increases variance (overfitting). The goal is the sweet spot where validation error is minimized.
Practical Guidance
Start with max_depth between 3 and 10 and tune using cross-validation. Cost-complexity pruning is more principled but requires more computation; it is the recommended post-pruning method in scikit-learn.