Building a Basic CNN from Scratch
Building a custom CNN in PyTorch involves stacking convolutional, activation, pooling, and dense layers to perform image classification.
Architecture Design
A standard CNN architecture consists of alternating convolutional and pooling layers, followed by fully connected layers.
Stacking Conv-ReLU-Pool Blocks
A standard CNN architecture consists of alternating convolutional and pooling layers, followed by fully connected layers. The convolutional layers extract features, the ReLU activations introduce non-linearity, and the pooling layers downsample spatial dimensions.
As the data flows deeper into the network, the spatial dimensions shrink while the channel depth increases, allowing the model to represent a larger number of complex, high-level features.
Transitioning from Features to Classifier
After the final feature extraction block, the 3D tensor is flattened and passed to the classifier. The classifier consists of one or more fully connected layers that map the features to class probabilities.
To prevent overfitting, dropout is often applied to the fully connected layers, regularizing the weights and forcing the model to learn robust feature combinations.
Implementing the CNN in PyTorch
Let's implement a complete CNN model module in PyTorch, highlighting the data flow and tensor shape conversions.
Defining the nn.Module Subclass
We can define our custom CNN class by subclassing nn.Module and registering the layers in the constructor. We will comment on the output tensor shapes at each step.
The forward pass runs successfully, returning output logits of shape [4, 10]. The comments help verify that our flattening dimension matches the output of the convolutional feature extractor.
Training Loop Configuration
To train this CNN, we need to define a loss function (like CrossEntropyLoss) and an optimizer (like SGD or Adam). The optimizer updates the model's weights using the computed gradients.
<pre><code class="language-python">import torch.optim as optim model = SimpleCNN(num_classes=10) criterion = nn.CrossEntropyLoss() optimizer = optim.Adam(model.parameters(), lr=0.001) # Simulated batch inputs = torch.randn(4, 3, 32, 32) labels = torch.randint(0, 10, (4,)) # Single training step optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() print("Training step loss:", loss.item())</pre>This training step represents a single optimization iteration. The optimizer resets gradients, the forward pass computes predictions, the loss function evaluates performance, and the backward pass calculates gradients before the weight update step.