P e x c e r a

Bounding Boxes and Intersection over Union (IoU)

Bounding boxes define the spatial boundaries of objects in an image, and evaluating their accuracy requires a metric that measures overlap. Intersection over Union (IoU) is the standard metric used to quantify how closely a predicted bounding box aligns with the ground truth.


Bounding Box Coordinate Formats

Bounding boxes are defined by four coordinates, but the format differs between datasets. The two primary standards are the Pascal VOC corner format and the COCO center/width-height format.

Corner vs Center Formats

The corner format (Pascal VOC) uses [x_{min}, y_{min}, x_{max}, y_{max}] to represent the top-left and bottom-right corners. The center-width-height format (COCO) uses [x_{min}, y_{min}, width, height], while some models prefer [x_{center}, y_{center}, width, height].

Conversions in PyTorch

You can manually convert formats or use torchvision.ops.box_convert to handle conversions between formats efficiently.

<pre><code class="language-python">import torchvision.ops as ops import torch # Box in [x_min, y_min, x_max, y_max] corner format boxes_xyxy = torch.tensor([[10, 20, 110, 120]], dtype=torch.float32) # Convert to [x_center, y_center, width, height] format boxes_cxcywh = ops.box_convert(boxes_xyxy, in_fmt='xyxy', out_fmt='cxcywh') print(boxes_cxcywh) # tensor([[60., 70., 100., 100.]])</pre>

Intersection over Union (IoU)

Intersection over Union (IoU) measures the degree of overlap between two boxes. It is calculated by dividing the area of overlap (intersection) by the total combined area (union).

The IoU Formula

Mathematically, the IoU is defined as: IoU = \\frac{\\text{Area of Intersection}}{\\text{Area of Union}} = \\frac{|A \\cap B|}{|A \\cup B|}. The value ranges from 0 (no overlap) to 1 (perfect alignment).

Implementing IoU in PyTorch

Calculating IoU requires finding the intersection coordinates, computing the intersection area, and adjusting for the union area.

<pre><code class="language-python">def calculate_iou(box1, box2): # box1, box2 shapes: [4] -> [x1, y1, x2, y2] xi1 = max(box1[0], box2[0]) yi1 = max(box1[1], box2[1]) xi2 = min(box1[2], box2[2]) yi2 = min(box1[3], box2[3]) inter_area = max(0.0, xi2 - xi1) * max(0.0, yi2 - yi1) box1_area = (box1[2] - box1[0]) * (box1[3] - box1[1]) box2_area = (box2[2] - box2[0]) * (box2[3] - box2[1]) union_area = box1_area + box2_area - inter_area return inter_area / union_area if union_area > 0 else 0.0 b1 = [10.0, 20.0, 100.0, 100.0] b2 = [50.0, 50.0, 150.0, 150.0] print(calculate_iou(b1, b2)) # 0.1652</pre>