4.3. The Base Classification Model

narcissuskid
发布于 2023-08-17 / 755 阅读 / 0 评论 / 1 点赞

4.3. The Base Classification Model

github: https://github.com/pandalabme/d2l/tree/main/exercises

1. Denote by L_v the validation loss, and let L_v^q be its quick and dirty estimate computed by the loss function averaging in this section. Lastly, denote by l_v^b the loss on the last minibatch. Express L_v in terms of L_v^q, l_v^b, and the sample and minibatch sizes.

We assume that the validation dataset is split into N samples, and each minibatch contains M samples.
The quick and dirty estimate L_v^q is computed by averaging the loss computed on each minibatch. Since there are N samples in total, and each minibatch contains M samples, there are N/M minibatches in total.
Now, let’s express L_v in terms of L_v^q, l_v^b, N, and M:
L_v is the true validation loss, and it can be considered as an average of the batch losses:
L_v = \frac{M}{N} \sum_{i=1}^{N/M}l_v^q

2. Show that the quick and dirty estimate L_v^q is unbiased. That is, show that E[L_v]=E[L_v^q]. Why would you still want to use L_v instead?

E[L_v] = E[\frac{M}{N} \sum_{i=1}^{N/M}l_v^q]==\frac{M}{N}\sum_{i=1}^{N/M}E[l_v^q]=E[l_v^q]

3. Given a multiclass classification loss, denoting by l(y,y^\prime) the penalty of estimating y^\prime when we see y and given a probabilty p(y|x), formulate the rule for an optimal selection of y^\prime.

Hint: express the expected loss, using l and p(y|x).

The optimal selection of y^\prime in a multiclass classification scenario can be formulated using the concept of expected loss. Given a true class y and a predicted class y^\prime, and assuming that p(y|x) represents the probability of observing class y given input x, the expected loss can be used to guide the decision-making process.
The expected loss \mathbb{E}[l(y, y^\prime)] is the average loss that we expect to incur when predicting y^\prime while the true class is y. To minimize the expected loss, we need to select the y^\prime that minimizes this average.
The optimal selection of y^\prime can be formulated as follows:
y^\prime = \arg\min_{\text{all possible } y^\prime} \sum_{y} p(y|x) \cdot l(y, y^\prime)

Reference

  1. https://d2l.ai/chapter_linear-classification/classification.html

评论