Bounding Boxes and Intersection over Union (IoU)
Bounding boxes define the spatial boundaries of objects in an image, and evaluating their accuracy requires a metric that measures overlap. Intersection over Union (IoU) is the standard metric used to quantify how closely a predicted bounding box aligns with the ground truth.
Bounding Box Coordinate Formats
Bounding boxes are defined by four coordinates, but the format differs between datasets. The two primary standards are the Pascal VOC corner format and the COCO center/width-height format.
Corner vs Center Formats
The corner format (Pascal VOC) uses [x_{min}, y_{min}, x_{max}, y_{max}] to represent the top-left and bottom-right corners. The center-width-height format (COCO) uses [x_{min}, y_{min}, width, height], while some models prefer [x_{center}, y_{center}, width, height].
Conversions in PyTorch
You can manually convert formats or use torchvision.ops.box_convert to handle conversions between formats efficiently.
Intersection over Union (IoU)
Intersection over Union (IoU) measures the degree of overlap between two boxes. It is calculated by dividing the area of overlap (intersection) by the total combined area (union).
The IoU Formula
Mathematically, the IoU is defined as: IoU = \\frac{\\text{Area of Intersection}}{\\text{Area of Union}} = \\frac{|A \\cap B|}{|A \\cup B|}. The value ranges from 0 (no overlap) to 1 (perfect alignment).
Implementing IoU in PyTorch
Calculating IoU requires finding the intersection coordinates, computing the intersection area, and adjusting for the union area.
<pre><code class="language-python">def calculate_iou(box1, box2): # box1, box2 shapes: [4] -> [x1, y1, x2, y2] xi1 = max(box1[0], box2[0]) yi1 = max(box1[1], box2[1]) xi2 = min(box1[2], box2[2]) yi2 = min(box1[3], box2[3]) inter_area = max(0.0, xi2 - xi1) * max(0.0, yi2 - yi1) box1_area = (box1[2] - box1[0]) * (box1[3] - box1[1]) box2_area = (box2[2] - box2[0]) * (box2[3] - box2[1]) union_area = box1_area + box2_area - inter_area return inter_area / union_area if union_area > 0 else 0.0 b1 = [10.0, 20.0, 100.0, 100.0] b2 = [50.0, 50.0, 150.0, 150.0] print(calculate_iou(b1, b2)) # 0.1652</pre>