The U-Net Architecture for Segmentation

The U-Net architecture, designed by Ronneberger et al., is a symmetric encoder-decoder network widely used for segmentation. It uses skip connections to transfer high-resolution spatial details from the contraction path directly to the expansion path, preserving fine boundaries.


Encoder-Decoder and Skip Connections

U-Net consists of a contracting path (encoder) that extracts features and a symmetric expanding path (decoder) that recovers spatial dimensions, joined by horizontal skip connections.

The U-Shaped Flow

The encoder reduces spatial dimensions while increasing channel depth. The decoder reverses this using upsampling. The skip connections concatenate the high-resolution features from the encoder directly onto the decoder features prior to convolution, compensating for spatial info lost during max pooling.

Loss Functions: Cross Entropy and Dice Loss

In segmentation, classes (like tumors vs. background) are often highly imbalanced. To train models effectively, we combine Cross-Entropy loss with Dice Loss (which measures mask overlap): L_{Dice} = 1 - \\frac{2 |P \\cap G|}{|P| + |G|}.

Building U-Net in PyTorch

Implementing U-Net requires writing modular blocks for downsampling, upsampling, concatenating skip connections, and producing output channels.

U-Net Decoder Step in PyTorch

This code illustrates how encoder features are concatenated with decoder features during upsampling.

<pre><code class="language-python">import torch import torch.nn as nn class UNetDecoderStep(nn.Module): def __init__(self, in_channels, out_channels): super().__init__() self.up = nn.ConvTranspose2d(in_channels, out_channels, kernel_size=2, stride=2) self.conv = nn.Sequential( nn.Conv2d(out_channels * 2, out_channels, kernel_size=3, padding=1), nn.ReLU(inplace=True) ) def forward(self, x_dec, x_enc): x_dec = self.up(x_dec) # Concatenate along the channel dimension x = torch.cat([x_dec, x_enc], dim=1) return self.conv(x) # Simulated encoder and decoder features enc_feat = torch.randn(1, 64, 32, 32) dec_feat = torch.randn(1, 128, 16, 16) step = UNetDecoderStep(in_channels=128, out_channels=64) print(step(dec_feat, enc_feat).shape) # torch.Size([1, 64, 32, 32])</pre>