Model Evaluation and Saving/Loading in PyTorch

Model evaluation assesses final performance on test data, while serialization allows saving and loading model parameters. Managing state dictionaries and device mapping is critical to deploy models in production.


Evaluation Protocols

Evaluation mode disables regularization layers and gradient calculations to measure model performance accurately.

Evaluation Mode and No-Grad Context

To evaluate a trained model, we set it to evaluation mode using model.eval(). This disables dropout regularization and freezes batch normalization layers, ensuring that the model uses its accumulated running statistics to make predictions.

We wrap the evaluation in a with torch.no_grad() block. This stops gradient tracking, saving GPU memory and increasing evaluation speed since no backward graph is constructed.

Metrics Calculation

During evaluation, we compute performance metrics such as accuracy, precision, recall, and F1-score. Since classification models output raw logits, we apply a activation function (like sigmoid or softmax) to obtain predictions.

Accumulating these predictions across the test set provides a robust evaluation of model performance, confirming whether the model is ready for deployment.

Serialization and Persistence

Saving and loading parameters is performed using PyTorch's state dictionary serialization.

State Dictionary vs. Full Model Serialization

PyTorch models can be saved in two ways: saving the entire model object or saving only its state dictionary (state_dict). Saving the state_dict (a Python dictionary mapping layer names to parameter tensors) is preferred:

<pre><code class="language-python"># Recommended: save state_dict torch.save(model.state_dict(), "model_weights.pth")</pre>

Saving the entire model object uses Python's pickle library, which can break if the directory structure or class definitions change. Saving only the weights avoids these dependency issues, improving code maintainability.

Device Mapping during Load

When loading saved weights, device mismatches are common. For instance, if a model was trained on a GPU and saved, loading it on a CPU-only server will fail because the tensors are stored with GPU references.

To prevent this, we specify the map_location parameter in torch.load(), which maps GPU tensor allocations to the CPU or a different GPU dynamically, ensuring cross-platform compatibility.

PyTorch Implementation

We can write PyTorch code to evaluate a model, serialize its weights, and load them back to run predictions.

Evaluating and Saving a Model

This code evaluates a trained model and saves its parameters to a local file:

<pre><code class="language-python">import torch import torch.nn as nn # Simple model model = nn.Sequential(nn.Linear(5, 2)) model.eval() # Set to evaluation mode # Save the model's parameters (state_dict) weight_path = "model_weights.pth" torch.save(model.state_dict(), weight_path) print("Model weights saved to:", weight_path)</pre>

In this code, we set the model to evaluation mode and save its parameters using torch.save(). The weights are stored in a serialized file format on disk.

Loading and Resuming Training

To load the weights back into a model, we instantiate the model class and call load_state_dict(), using map_location to handle device mapping:

<pre><code class="language-python"># Re-instantiate the model structure loaded_model = nn.Sequential(nn.Linear(5, 2)) # Load weights, mapping to CPU if CUDA is not available device = torch.device("cpu") loaded_state = torch.load("model_weights.pth", map_location=device) loaded_model.load_state_dict(loaded_state) loaded_model.eval() print("Model weights loaded successfully.") # Verify prediction matches x = torch.randn(1, 5) with torch.no_grad(): pred = loaded_model(x) print("Prediction shape:", pred.shape) # torch.Size([1, 2])</pre>

This loading sequence maps the serialized parameters to the target device. Loading the state dict restores the model's trained parameters, allowing us to run inference or resume training.