Generative Adversarial Networks (GANs) Introduction

Generative Adversarial Networks (GANs), introduced by Ian Goodfellow in 2014, formulate generative modeling as a two-player zero-sum game. A Generator network learns to produce realistic data to fool a Discriminator network, which simultaneously learns to distinguish real data from generated samples.


The Minimax Game Formulation

GANs optimize a minimax objective where a generator and discriminator compete, driving the model toward learning the data distribution.

Two-Player Zero-Sum Game

GANs consist of two competing networks: a Generator \\( G \\) and a Discriminator \\( D \\). The generator takes a random noise vector \\( \\mathbf{z} \\) (drawn from a prior distribution like a uniform or Gaussian distribution \\( p_z \\)) and maps it to the data space: \\( G(\\mathbf{z}) \\). The discriminator takes a data sample \\( \\mathbf{x} \\) (which can be a real sample from the dataset or a fake sample from the generator) and outputs a probability score \\( D(\\mathbf{x}) \\in [0, 1] \\) indicating whether the sample is real.

This framework is modeled as a zero-sum minimax game. The discriminator is trained to maximize the probability of assigning the correct label to both real and fake samples, while the generator is trained to minimize the probability that the discriminator identifies its samples as fake. The competition drives both networks to improve, leading the generator to produce highly realistic samples.

Objective Function

The mathematical objective of the minimax game is formulated as a value function \\( V(D, G) \\):

\\( \\min_G \\max_D V(D, G) = \\mathbb{E}_{\\mathbf{x} \\sim p_{data}}[\\log D(\\mathbf{x})] + \\mathbb{E}_{\\mathbf{z} \\sim p_z}[\\log (1 - D(G(\\mathbf{z})))] \\)

The first term represents the expectation of the log-probability that \\( D \\) correctly identifies real samples. The second term represents the expectation of the log-probability that \\( D \\) identifies generated samples as fake. The discriminator updates its weights to maximize \\( V(D, G) \\), while the generator updates its weights to minimize it. The game reaches a Nash equilibrium when the generator's distribution matches the data distribution: \\( p_g = p_{data} \\), at which point the optimal discriminator outputs \\( D(\\mathbf{x}) = 0.5 \\) everywhere.

PyTorch Vanilla GAN Implementation

A simple PyTorch implementation defines fully connected generator and discriminator networks, showing the forward and loss calculations.

Model Definitions

The following PyTorch code implements the Generator and Discriminator networks for a simple fully connected GAN:

<pre><code class="language-python">import torch import torch.nn as nn class Generator(nn.Module): def __init__(self, latent_dim, img_dim): super().__init__() self.net = nn.Sequential( nn.Linear(latent_dim, 128), nn.LeakyReLU(0.2), nn.Linear(128, 256), nn.BatchNorm1d(256), nn.LeakyReLU(0.2), nn.Linear(256, img_dim), nn.Tanh() # Maps output pixels to [-1, 1] ) def forward(self, z): # z shape: [batch_size, latent_dim] return self.net(z) # [batch_size, img_dim] class Discriminator(nn.Module): def __init__(self, img_dim): super().__init__() self.net = nn.Sequential( nn.Linear(img_dim, 256), nn.LeakyReLU(0.2), nn.Linear(256, 128), nn.LeakyReLU(0.2), nn.Linear(128, 1), nn.Sigmoid() # Outputs probability score [0, 1] ) def forward(self, img): # img shape: [batch_size, img_dim] return self.net(img) # [batch_size, 1]</pre>

Training step

This PyTorch code demonstrates a single training iteration for both the discriminator and generator using Binary Cross-Entropy loss:

<pre><code class="language-python">def train_gan_step(generator, discriminator, g_opt, d_opt, bce_loss, real_imgs, latent_dim): batch_size = real_imgs.size(0) device = real_imgs.device # Labels for classification real_labels = torch.ones(batch_size, 1, device=device) fake_labels = torch.zeros(batch_size, 1, device=device) # ----------------------- # Train Discriminator: max E[log D(x)] + E[log(1 - D(G(z)))] # ----------------------- d_opt.zero_grad() # Loss on real images outputs_real = discriminator(real_imgs) d_loss_real = bce_loss(outputs_real, real_labels) # Loss on fake images z = torch.randn(batch_size, latent_dim, device=device) fake_imgs = generator(z) outputs_fake = discriminator(fake_imgs.detach()) # detach prevents backprop to Generator d_loss_fake = bce_loss(outputs_fake, fake_labels) d_loss = d_loss_real + d_loss_fake d_loss.backward() d_opt.step() # ----------------------- # Train Generator: min E[log(1 - D(G(z)))] -> max E[log D(G(z))] # ----------------------- g_opt.zero_grad() # Re-evaluate fake images with Generator active outputs_fake_evaluated = discriminator(fake_imgs) # We use real labels here because the generator wants to maximize D(G(z)) g_loss = bce_loss(outputs_fake_evaluated, real_labels) g_loss.backward() g_opt.step() return d_loss.item(), g_loss.item()</pre>

Training Dynamics and Optimization

Optimizing GANs involves minimizing the Jensen-Shannon divergence, which requires adapting the generator loss to prevent early gradient saturation.

Non-Saturating Generator Loss

In their paper, Goodfellow et al. proved that for a fixed generator, the optimal discriminator is: \\( D^*_G(\\mathbf{x}) = \\frac{p_{data}(\\mathbf{x})}{p_{data}(\\mathbf{x}) + p_g(\\mathbf{x})} \\). By substituting this optimal discriminator back into the minimax value function, the optimization objective simplifies to: \\( V(D^*_G, G) = -\\log(4) + 2 \\cdot D_{JS}(p_{data} \\parallel p_g) \\), where \\( D_{JS} \\) is the Jensen-Shannon Divergence (JSD).

Thus, training the generator under the minimax game is mathematically equivalent to minimizing the JSD between the generator's distribution and the true data distribution. Because the JSD is symmetric and bounded between 0 and \\( \\log(2) \\), it provides a stable metric for evaluating the distance between distributions, though it can suffer from vanishing gradients if the distributions do not overlap.

Non-Saturating Generator Loss

In the early stages of training, the generator is weak, and the discriminator can easily identify generated samples as fake. Under the original minimax objective \\( \\min \\log(1 - D(G(\\mathbf{z}))) \\), when \\( D(G(\\mathbf{z})) \\) is close to 0, the derivative of the loss function is flat, meaning the generator receives almost no gradient updates. The model struggles to learn.

To solve this gradient saturation problem, we use the non-saturating generator loss: \\( \\max \\log D(G(\\mathbf{z})) \\) (or equivalently, minimizing \\( -\\log D(G(\\mathbf{z})) \\)). This formulation changes the generator's target: instead of training the generator to avoid being caught, we train it to maximize the probability that the discriminator classifies its outputs as real. This shift provides strong gradients early in training, stabilizing convergence.