MATLAB Answers


Increasing vocabulary of pre-trained word embeddings

Can we extend the pre-trained word embeddings and increase the vocabulary?

1 Answer

Answer by MathWorks Support Team on 19 Jun 2019 at 4:00
 Accepted Answer

Yes. In order to add more words to the existing vocabulary given by 'fastTextWordEmbedding', you can try the following:
1. Obtain the wordEmbedding object for 'fastTextWordEmbedding'-
>> emb = fastTextWordEmbedding;
2. Obtain the vocabulary from the wordEmbedding object:
>> vocab = emb.Vocabulary;
3. Add more words to the string array, for example:
>> vocab(end+1) = 'Hi';
>> vocab(end+1) = 'Hello';
4. Write to a text file with UTF-8 encoding in either the word2vec or GloVe text embedding format, or a zip file containing a text file of this format. You can use fopen, fprintf and fclose for this step:
5. Use 'readWordEmbedding' to read this text file with additional words, to get a new word embedding object. The doc page for 'readWordEmbedding' would explain more about why the file needs to be in the above format.


Sign in to comment.