Increasing vocabulary of pre-trained word embeddings
1 visualizzazione (ultimi 30 giorni)
Mostra commenti meno recenti
MathWorks Support Team
il 3 Mag 2019
Modificato: MathWorks Support Team
il 27 Set 2021
Can we extend the pre-trained word embeddings and increase the vocabulary?
Risposta accettata
MathWorks Support Team
il 2 Set 2021
Modificato: MathWorks Support Team
il 27 Set 2021
Yes. In order to add more words to the existing vocabulary given by 'fastTextWordEmbedding', you can try the following:
1. Obtain the wordEmbedding object for 'fastTextWordEmbedding'-
>> emb = fastTextWordEmbedding;
2. Obtain the vocabulary from the wordEmbedding object:
>> vocab = emb.Vocabulary;
3. Add more words to the string array, for example:
>> vocab(end+1) = 'Hi';
>> vocab(end+1) = 'Hello';
4. Write to a text file with UTF-8 encoding in either the word2vec or GloVe text embedding format, or a zip file containing a text file of this format. You can use fopen, fprintf and fclose for this step:
5. Use 'readWordEmbedding' to read this text file with additional words, to get a new word embedding object. The doc page for 'readWordEmbedding' would explain more about why the file needs to be in the above format.
0 Commenti
Più risposte (0)
Vedere anche
Categorie
Scopri di più su Migrate GUIDE Apps in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!