11.9. Large-Scale Pretraining with Transformers

github: https://github.com/pandalabme/d2l/tree/main/exercises 1. Is it possible to fine-tune T5 using a minibatch consisting of different tasks? Why o

narcissuskid narcissuskid 发布于 2023-09-11

11.8. Transformers for Vision

github: https://github.com/pandalabme/d2l/tree/main/exercises 1. How does the value of img_size affect training time? The value of img_size affects th

narcissuskid narcissuskid 发布于 2023-09-11

11.7. The Transformer Architecture

github: https://github.com/pandalabme/d2l/tree/main/exercises 1. Train a deeper Transformer in the experiments. How does it affect the training speed

narcissuskid narcissuskid 发布于 2023-09-11

11.6. Self-Attention and Positional Encoding

github: https://github.com/pandalabme/d2l/tree/main/exercises 1. Suppose that we design a deep architecture to represent a sequence by stacking self-a

narcissuskid narcissuskid 发布于 2023-09-11

11.5. Multi-Head Attention

github: https://github.com/pandalabme/d2l/tree/main/exercises 1. Visualize attention weights of multiple heads in this experiment. import sys import t

narcissuskid narcissuskid 发布于 2023-09-10

11.4. The Bahdanau Attention Mechanism

github: https://github.com/pandalabme/d2l/tree/main/exercises 1. Replace GRU with LSTM in the experiment. import sys import torch.nn as nn import torc

narcissuskid narcissuskid 发布于 2023-09-10

11.3. Attention Scoring Functions

github: https://github.com/pandalabme/d2l/tree/main/exercises 1. Implement distance-based attention by modifying the DotProductAttention code. Note th

narcissuskid narcissuskid 发布于 2023-09-10

11.2. Attention Pooling by Similarity

github: https://github.com/pandalabme/d2l/tree/main/exercises 1. Parzen windows density estimates are given by \hat{p}(x)=\frac{1}{n}\sum_ik(x,x_i)

narcissuskid narcissuskid 发布于 2023-09-09

11.1. Queries, Keys, and Values

github: https://github.com/pandalabme/d2l/tree/main/exercises import sys import torch.nn as nn import torch import warnings from sklearn.model_selecti

narcissuskid narcissuskid 发布于 2023-09-09

10.7. Sequence-to-Sequence Learning for Machine Translation

github: https://github.com/pandalabme/d2l/tree/main/exercises import sys import torch.nn as nn import torch import warnings import numpy as np from sk

narcissuskid narcissuskid 发布于 2023-09-07