github:
https://github.com/pandalabme/d2l/tree/main/exercises

1. Even if there is no need to deploy trained models to a different device, what are the practical benefits of storing model parameters?

Storing model parameters has several practical benefits even if there is no immediate need to deploy trained models to different devices. Here are some reasons why storing model parameters is valuable:

Faster Model Initialization: When you load pretrained model parameters, you avoid the need to retrain the model from scratch. This can significantly speed up the process of initializing your model for further training or evaluation.
Experiment Reproducibility: Storing model parameters allows you to reproduce experimental results consistently. Researchers and practitioners often share pretrained models along with their parameters to ensure that others can replicate their findings.
Resource Savings: Training deep learning models can be computationally intensive and time-consuming. By storing pretrained parameters, you save computational resources and time by not needing to retrain the model every time you want to use it.
Fine-Tuning: If you have a pretrained model, you can fine-tune it on a specific task by adjusting a smaller set of parameters while keeping the bulk of the model parameters fixed. This can help you achieve better performance with limited training data.
Model Versioning: Storing model parameters allows you to version your models, making it easier to keep track of changes and improvements over time. This is important for model maintenance and updates.
Knowledge Transfer: Model parameters can carry learned knowledge from one task to another, even if the tasks are related but not identical. Transfer learning leverages pretrained models to improve performance on new tasks.
Sharing and Collaboration: Sharing pretrained models, along with their parameters, encourages collaboration and knowledge exchange among researchers and practitioners.
Efficient Storage: Model parameters are typically much smaller in size compared to storing entire model architectures. This makes it feasible to store multiple versions of models for different purposes without consuming excessive storage space.
Offline Inference: If you need to make predictions on new data without access to the original training infrastructure, storing model parameters allows you to perform inference offline.

Overall, storing model parameters is a common practice in deep learning due to the efficiency, reproducibility, and versatility it provides, even if immediate deployment to different devices is not a requirement.

2. Assume that we want to reuse only parts of a network to be incorporated into a network having a different architecture. How would you go about using, say the first two layers from a previous network in a new network?

import sys
import torch.nn as nn
import torch
import warnings
sys.path.append('/home/jovyan/work/d2l_solutions/notebooks/exercises/d2l_utils/')
import d2l
warnings.filterwarnings("ignore")

class ReusedMLP(d2l.Classifier):
     def __init__(self, num_outputs, num_hiddens, lr, reused_layer):
        super().__init__()
        self.save_hyperparameters()
        layers = [reused_layer,nn.ReLU()]
        for num in num_hiddens:
            layers.append(nn.LazyLinear(num))
            layers.append(nn.ReLU())
        layers.append(nn.LazyLinear(num_outputs))
        self.net = nn.Sequential(*layers)
        
hparams = {'num_hiddens':[256,128,64,32],'num_outputs':10,'lr':0.1}
model = d2l.MulMLP(**hparams)
X = torch.randn(size=(2, 20))
Y = model(X)
torch.save(model.state_dict(), 'mlp.params')
clone = d2l.MulMLP(**hparams)
clone.load_state_dict(torch.load('mlp.params'))
reused_layer = clone.net[:2]
hparams = {'num_hiddens':[8,4],'num_outputs':3,'lr':0.1,'reused_layer':reused_layer}
new_model = ReusedMLP(**hparams)
new_model.net

Sequential(
  (0): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): LazyLinear(in_features=0, out_features=256, bias=True)
  )
  (1): ReLU()
  (2): LazyLinear(in_features=0, out_features=8, bias=True)
  (3): ReLU()
  (4): LazyLinear(in_features=0, out_features=4, bias=True)
  (5): ReLU()
  (6): LazyLinear(in_features=0, out_features=3, bias=True)
)

3. How would you go about saving the network architecture and parameters? What restrictions would you impose on the architecture?

3.1 Saving Model Architecture and Parameters

To save a network’s architecture and parameters, you can use PyTorch’s built-in mechanisms for model serialization and saving. You typically save the model architecture as a Python script or a serialized format like JSON, while you save the model parameters in a separate file. Here’s how you can do it:

hparams = {'num_hiddens':[256,128,64,32],'num_outputs':10,'lr':0.1}
model = d2l.MulMLP(**hparams)
# Save the model architecture to a script or JSON
torch.save(model, 'model_architecture.pth')

# Save the model parameters to a separate file
torch.save(model.state_dict(), 'model_parameters.pth')

loaded_architecture = torch.load('model_architecture.pth')
loaded_architecture.net

Sequential(
  (0): Flatten(start_dim=1, end_dim=-1)
  (1): LazyLinear(in_features=0, out_features=256, bias=True)
  (2): ReLU()
  (3): LazyLinear(in_features=0, out_features=128, bias=True)
  (4): ReLU()
  (5): LazyLinear(in_features=0, out_features=64, bias=True)
  (6): ReLU()
  (7): LazyLinear(in_features=0, out_features=32, bias=True)
  (8): ReLU()
  (9): LazyLinear(in_features=0, out_features=10, bias=True)
)

In this example, model_architecture.pth contains the model architecture, and model_parameters.pth contains the model’s learned parameters.

2.2 2 Restrictions on Architecture

When saving the architecture and parameters, you should be aware of the following:

Module Names and Compatibility: Make sure that the names of the modules in the saved architecture match the module names when loading. Also, ensure that the architecture’s layers and connections are compatible with the model’s expected input size and tensor shapes.
Custom Layers: If your architecture includes custom layers or classes, you’ll need to ensure that they are defined and accessible when loading the architecture.
Forward Function: The model’s forward function should be defined properly to handle the input tensor sizes correctly. Mismatched tensor shapes can lead to errors.
PyTorch Version: Be mindful of the PyTorch version when saving and loading models. Compatibility issues might arise if you try to load a model from a different PyTorch version.
Dependencies: If your architecture uses external libraries or dependencies, you’ll need to ensure that those dependencies are available when loading the architecture.

Remember that while saving the architecture and parameters is relatively straightforward, ensuring compatibility and proper handling of the model during loading is crucial to avoid errors and achieve accurate results.

Reference

https://d2l.ai/chapter_builders-guide/read-write.html

菜单

6.6. File I/O

6.6. File I/O

1. Even if there is no need to deploy trained models to a different device, what are the practical benefits of storing model parameters?

2. Assume that we want to reuse only parts of a network to be incorporated into a network having a different architecture. How would you go about using, say the first two layers from a previous network in a new network?

3. How would you go about saving the network architecture and parameters? What restrictions would you impose on the architecture?

3.1 Saving Model Architecture and Parameters

2.2 2 Restrictions on Architecture

Reference

评论

Different Perspective of Line Regression

15.7. Word Similarity and Analogy

15.5. Word Embedding with Global Vectors (GloVe)

15.6. Subword Embedding

15.4. Pretraining word2vec

15.3. The Dataset for Pretraining Word Embeddings

15.2. Approximate Training

15.1. Word Embedding (word2vec)

解决docker部署的jupyter容器中matplotlib中文乱码

pyspider安装报错解决