Random Forests: Architecture
Random Forests combine bagging with random feature subsets at each split to build an ensemble of decorrelated trees that collectively achieve strong generalization.
Building the Forest
Each tree is grown on a bootstrap sample of the data. At every node, only a random subset of max_features features is considered for splitting, preventing all trees from using the same dominant feature.
Training a Random Forest
Key Hyperparameters
Random Forests have several hyperparameters that control the size and diversity of the ensemble.
Tuning n_estimators and max_features
n_estimators: More trees reduce variance but add computation; 100–500 is typical. max_features: 'sqrt' is standard for classification; 'log2' or a fixed integer are alternatives. min_samples_leaf controls tree depth and smoothness of predictions.
GridSearch Tuning
Prediction Aggregation
For classification, each tree votes for a class; the forest returns the class with the most votes (hard voting) or highest average probability (soft voting via predict_proba).
Probability Outputs
Calling rf.predict_proba(X_test) returns the fraction of trees voting for each class, providing calibrated probability estimates that can be thresholded or used directly in downstream decision-making.