Model Evaluation and Saving/Loading in PyTorch
Model evaluation assesses final performance on test data, while serialization allows saving and loading model parameters. Managing state dictionaries and device mapping is critical to deploy models in production.
Evaluation Protocols
Evaluation mode disables regularization layers and gradient calculations to measure model performance accurately.
Evaluation Mode and No-Grad Context
To evaluate a trained model, we set it to evaluation mode using model.eval(). This disables dropout regularization and freezes batch normalization layers, ensuring that the model uses its accumulated running statistics to make predictions.
We wrap the evaluation in a with torch.no_grad() block. This stops gradient tracking, saving GPU memory and increasing evaluation speed since no backward graph is constructed.
Metrics Calculation
During evaluation, we compute performance metrics such as accuracy, precision, recall, and F1-score. Since classification models output raw logits, we apply a activation function (like sigmoid or softmax) to obtain predictions.
Accumulating these predictions across the test set provides a robust evaluation of model performance, confirming whether the model is ready for deployment.
Serialization and Persistence
Saving and loading parameters is performed using PyTorch's state dictionary serialization.
State Dictionary vs. Full Model Serialization
PyTorch models can be saved in two ways: saving the entire model object or saving only its state dictionary (state_dict). Saving the state_dict (a Python dictionary mapping layer names to parameter tensors) is preferred:
Saving the entire model object uses Python's pickle library, which can break if the directory structure or class definitions change. Saving only the weights avoids these dependency issues, improving code maintainability.
Device Mapping during Load
When loading saved weights, device mismatches are common. For instance, if a model was trained on a GPU and saved, loading it on a CPU-only server will fail because the tensors are stored with GPU references.
To prevent this, we specify the map_location parameter in torch.load(), which maps GPU tensor allocations to the CPU or a different GPU dynamically, ensuring cross-platform compatibility.
PyTorch Implementation
We can write PyTorch code to evaluate a model, serialize its weights, and load them back to run predictions.
Evaluating and Saving a Model
This code evaluates a trained model and saves its parameters to a local file:
<pre><code class="language-python">import torch import torch.nn as nn # Simple model model = nn.Sequential(nn.Linear(5, 2)) model.eval() # Set to evaluation mode # Save the model's parameters (state_dict) weight_path = "model_weights.pth" torch.save(model.state_dict(), weight_path) print("Model weights saved to:", weight_path)</pre>In this code, we set the model to evaluation mode and save its parameters using torch.save(). The weights are stored in a serialized file format on disk.
Loading and Resuming Training
To load the weights back into a model, we instantiate the model class and call load_state_dict(), using map_location to handle device mapping:
This loading sequence maps the serialized parameters to the target device. Loading the state dict restores the model's trained parameters, allowing us to run inference or resume training.