Non-Maximum Suppression (NMS)
Object detectors often output multiple overlapping bounding boxes for a single physical object. Non-Maximum Suppression (NMS) is the critical post-processing algorithm that filters these redundant proposals, leaving only the most confident predictions.
The NMS Algorithm
NMS operates by sorting predictions by confidence score, keeping the highest-scoring box, and discarding any other boxes of the same class that overlap heavily with it.
Step-by-Step Filtering
1. Sort all predicted boxes by confidence score.
2. Select the box with the highest score and save it.
3. Calculate the IoU between this box and all other remaining boxes.
4. Discard any box with an IoU greater than a predefined threshold (e.g., 0.5).
5. Repeat the process until no boxes remain in the list.
PyTorch Implementation
Torchvision provides a GPU-accelerated NMS function that takes boxes, scores, and an IoU threshold.
<pre><code class="language-python">import torchvision.ops as ops import torch # Bounding boxes: [x1, y1, x2, y2] boxes = torch.tensor([ [10.0, 10.0, 100.0, 100.0], [12.0, 11.0, 98.0, 102.0], # Redundant box with high IoU [200.0, 200.0, 300.0, 300.0] ], dtype=torch.float32) scores = torch.tensor([0.9, 0.75, 0.85], dtype=torch.float32) # Keep indices of selected boxes keep_indices = ops.nms(boxes, scores, iou_threshold=0.5) print(keep_indices) # tensor([0, 2])</pre>Advanced NMS Variations
Standard NMS can fail to detect overlapping objects of the same class (like people standing in a crowd). Advanced variations adjust confidence scores instead of discarding boxes entirely.
Soft-NMS
Instead of setting confidence to zero for overlapping boxes, Soft-NMS decays their scores proportional to their IoU. This ensures that close but distinct objects are not completely suppressed but are ranked lower.
Class-Agnostic vs. Class-Specific
Class-specific NMS only suppresses overlapping boxes of the same class. Class-agnostic NMS suppresses any overlapping box regardless of its predicted label, which is useful when preventing multi-class overlap (e.g., classifying a region as both a car and a truck).