Increasing vocabulary of pre-trained word embeddings

1 visualizzazione (ultimi 30 giorni)
Can we extend the pre-trained word embeddings and increase the vocabulary?

Risposta accettata

MathWorks Support Team
MathWorks Support Team il 2 Set 2021
Modificato: MathWorks Support Team il 27 Set 2021
Yes. In order to add more words to the existing vocabulary given by 'fastTextWordEmbedding', you can try the following:
1. Obtain the wordEmbedding object for 'fastTextWordEmbedding'-
>> emb = fastTextWordEmbedding;
2. Obtain the vocabulary from the wordEmbedding object:
>> vocab = emb.Vocabulary;
3. Add more words to the string array, for example:
>> vocab(end+1) = 'Hi';
>> vocab(end+1) = 'Hello';
4. Write to a text file with UTF-8 encoding in either the word2vec or GloVe text embedding format, or a zip file containing a text file of this format. You can use fopen, fprintf and fclose for this step:
5. Use 'readWordEmbedding' to read this text file with additional words, to get a new word embedding object. The doc page for 'readWordEmbedding' would explain more about why the file needs to be in the above format.

Più risposte (0)

Categorie

Scopri di più su Migrate GUIDE Apps in Help Center e File Exchange

Prodotti


Release

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by