Panoptic Segmentation

Panoptic segmentation unifies semantic and instance segmentation into a single task. It assigns a unique class label and instance ID to every pixel in an image, providing a complete scene understanding that covers background 'stuff' and foreground 'things'.

Things vs. Stuff

Panoptic segmentation classifies the environment into two categories: countable objects (things) and amorphous background regions (stuff).

Definitions & Examples

Things are distinct, countable instances (e.g., person, car, animal). Stuff represents background elements that lack individual instance boundaries (e.g., sky, grass, water, road).

Panoptic Representation

The output of panoptic segmentation is a single label map where each pixel contains a tuple: [class_id, instance_id]. For 'stuff' pixels, the instance ID is ignored or set to a default null value.

Evaluation Metrics: Panoptic Quality (PQ)

Evaluating panoptic segmentation requires a metric that balances classification correctness and segmentation quality. Panoptic Quality (PQ) is the standard metric used.

The PQ Formula

PQ divides the calculation into Segmentation Quality (SQ, matching box accuracy) and Recognition Quality (RQ, F1-score classification): PQ = \\text{SQ} \\times \\text{RQ} = \\frac{\\sum_{(p, g) \\in TP} \\text{IoU}(p, g)}{|TP|} \\times \\frac{|TP|}{|TP| + 0.5 |FP| + 0.5 |FN|}.

Unified Architectures

Modern networks like Panoptic FPN use a shared backbone with separate heads: a semantic segmentation branch for 'stuff' and a Mask R-CNN style head for 'things', merging their outputs in a final post-processing step.