Building a Basic CNN from Scratch

Building a custom CNN in PyTorch involves stacking convolutional, activation, pooling, and dense layers to perform image classification.

Architecture Design

A standard CNN architecture consists of alternating convolutional and pooling layers, followed by fully connected layers.

Stacking Conv-ReLU-Pool Blocks

A standard CNN architecture consists of alternating convolutional and pooling layers, followed by fully connected layers. The convolutional layers extract features, the ReLU activations introduce non-linearity, and the pooling layers downsample spatial dimensions.

As the data flows deeper into the network, the spatial dimensions shrink while the channel depth increases, allowing the model to represent a larger number of complex, high-level features.

Transitioning from Features to Classifier

After the final feature extraction block, the 3D tensor is flattened and passed to the classifier. The classifier consists of one or more fully connected layers that map the features to class probabilities.

To prevent overfitting, dropout is often applied to the fully connected layers, regularizing the weights and forcing the model to learn robust feature combinations.

Implementing the CNN in PyTorch

Let's implement a complete CNN model module in PyTorch, highlighting the data flow and tensor shape conversions.

Defining the nn.Module Subclass

We can define our custom CNN class by subclassing nn.Module and registering the layers in the constructor. We will comment on the output tensor shapes at each step.

<pre><code class="language-python">import torch import torch.nn as nn class SimpleCNN(nn.Module): def __init__(self, num_classes=10): super().__init__() self.features = nn.Sequential( # Input: [batch, 3, 32, 32] nn.Conv2d(3, 16, kernel_size=3, padding=1), # [batch, 16, 32, 32] nn.ReLU(), nn.MaxPool2d(2, 2), # [batch, 16, 16, 16] nn.Conv2d(16, 32, kernel_size=3, padding=1), # [batch, 32, 16, 16] nn.ReLU(), nn.MaxPool2d(2, 2) # [batch, 32, 8, 8] ) self.classifier = nn.Sequential( nn.Flatten(), # [batch, 2048] nn.Linear(32 * 8 * 8, 128), nn.ReLU(), nn.Dropout(0.5), nn.Linear(128, num_classes) # [batch, num_classes] ) def forward(self, x): x = self.features(x) logits = self.classifier(x) return logits x = torch.randn(4, 3, 32, 32) model = SimpleCNN(num_classes=10) out = model(x) print("Output shape:", out.shape) # [4, 10]</pre>

The forward pass runs successfully, returning output logits of shape [4, 10]. The comments help verify that our flattening dimension matches the output of the convolutional feature extractor.

Training Loop Configuration

To train this CNN, we need to define a loss function (like CrossEntropyLoss) and an optimizer (like SGD or Adam). The optimizer updates the model's weights using the computed gradients.

<pre><code class="language-python">import torch.optim as optim model = SimpleCNN(num_classes=10) criterion = nn.CrossEntropyLoss() optimizer = optim.Adam(model.parameters(), lr=0.001) # Simulated batch inputs = torch.randn(4, 3, 32, 32) labels = torch.randint(0, 10, (4,)) # Single training step optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() print("Training step loss:", loss.item())</pre>

This training step represents a single optimization iteration. The optimizer resets gradients, the forward pass computes predictions, the loss function evaluates performance, and the backward pass calculates gradients before the weight update step.