github: https://github.com/pandalabme/d2l/tree/main/exercises

1. As an example, there are about $3\times 10^8$ possible 6-grams in English. What is the issue when there are too many subwords? How to address the issue? Hint: refer to the end of Section 3.2 of the fastText paper (Bojanowski et al., 2017).

To address the issue, the fastText paper proposes to use a hashing function to map each subword to a fixed-size set of buckets, and use the bucket index as the input feature. This way, the dimensionality of the input matrix is reduced to the number of buckets, which is much smaller than the number of possible subwords. Furthermore, the paper suggests to use a hierarchical softmax function to speed up the output computation, and to use a sub-sampling scheme to discard the most frequent subwords, which are less informative

2. How to design a subword embedding model based on the continuous bag-of-words model?

Define a CBOW model that takes a sequence of context subwords as input, and outputs a probability distribution over the target subwords

3. To get a vocabulary of size m, how many merging operations are needed when the initial symbol vocabulary size is n?

n-m

4. How to extend the idea of byte pair encoding to extract phrases?

To extend the idea of BPE to extract phrases, one possible approach is to apply BPE on the word level, rather than the character level. That is, instead of treating each character as a symbol, we can treat each word as a symbol, and merge the most frequent pair of words into a new phrase. This way, we can obtain a vocabulary of phrases that capture the collocations and idioms of the language.

Reference

https://d2l.ai/chapter_natural-language-processing-pretraining/subword-embedding.html

菜单

15.6. Subword Embedding

15.6. Subword Embedding

1. As an example, there are about $3\times 10^8$ possible 6-grams in English. What is the issue when there are too many subwords? How to address the issue? Hint: refer to the end of Section 3.2 of the fastText paper (Bojanowski et al., 2017).

2. How to design a subword embedding model based on the continuous bag-of-words model?

3. To get a vocabulary of size m, how many merging operations are needed when the initial symbol vocabulary size is n?

4. How to extend the idea of byte pair encoding to extract phrases?

Reference

评论

Different Perspective of Line Regression

15.7. Word Similarity and Analogy

15.5. Word Embedding with Global Vectors (GloVe)

15.6. Subword Embedding

15.4. Pretraining word2vec

15.3. The Dataset for Pretraining Word Embeddings

15.2. Approximate Training

15.1. Word Embedding (word2vec)

解决docker部署的jupyter容器中matplotlib中文乱码

pyspider安装报错解决

15.6. Subword Embedding

1. As an example, there are about 3\times 10^8 possible 6-grams in English. What is the issue when there are too many subwords? How to address the issue? Hint: refer to the end of Section 3.2 of the fastText paper (Bojanowski et al., 2017).

2. How to design a subword embedding model based on the continuous bag-of-words model?

3. To get a vocabulary of size m, how many merging operations are needed when the initial symbol vocabulary size is n?

4. How to extend the idea of byte pair encoding to extract phrases?

Reference

评论

1. As an example, there are about $3\times 10^8$ possible 6-grams in English. What is the issue when there are too many subwords? How to address the issue? Hint: refer to the end of Section 3.2 of the fastText paper (Bojanowski et al., 2017).