Facial Recognition Pipelines (Siamese Networks)

Facial recognition is framed as a metric learning task rather than standard classification. Siamese networks learn a mapping function that projects face images into a low-dimensional embedding space where geometric distances reflect face identity similarity.

Siamese Network Architecture

A Siamese network consists of two identical CNN branch networks that share the same weights. It accepts two images and outputs their low-dimensional embedding vectors.

Shared Weights & Similarity

Because both branches share weights, identical inputs yield identical embeddings. The similarity between two faces is measured by calculating the Euclidean distance or cosine similarity between their embedding vectors d = ||f(x_1) - f(x_2)||^2.

Face Verification vs. Identification

Verification (1:1 matching) compares an input face against a template to confirm identity. Identification (1:N matching) compares the input face against a database of templates to find the matching identity.

Metric Learning Loss Functions

Training Siamese networks requires loss functions that pull embeddings of the same person together and push embeddings of different people apart.

Contrastive and Triplet Loss

Contrastive loss optimizes pairs. Triplet loss uses three inputs: an anchor (A), a positive (P, same identity), and a negative (N, different identity). It enforces: d(A, P) + \\text{margin} < d(A, N).

Triplet Loss Implementation in PyTorch

PyTorch includes a built-in TripletMarginLoss to optimize embedding spaces.

<pre><code class="language-python">import torch import torch.nn as nn triplet_loss = nn.TripletMarginLoss(margin=1.0, p=2) # Simulated embeddings: [batch_size, embedding_dim] anchor = torch.randn(4, 128, requires_grad=True) positive = torch.randn(4, 128, requires_grad=True) negative = torch.randn(4, 128, requires_grad=True) loss = triplet_loss(anchor, positive, negative) loss.backward() print(loss.item()) # Scalar loss value</pre>