- Use context around the OOV word. You can use the word embedding of the previous and next word to your current OOV word.
- Use synonyms or similar word to get the word embedding for your OOV word.
Handling out-of-vocabulary word in word embedding
1 visualizzazione (ultimi 30 giorni)
Mostra commenti meno recenti
I'm using FastText and my own word embedding on a set of documents. It is being used to detect abbreviations (Y/N) for each word token.
When testing, words that does not have vectors (out-of-vocabulary - OOV words), and discarded and not included in the performance measures (precision, recall, etc.) giving a false result. How do you handle this?
Would you replace all words with NaN values be included in the performance measure? Can the NaN values be replaced with a vector? How would you decide which vector?
0 Commenti
Risposte (1)
Prince Kumar
il 16 Ago 2021
From my understanding your want to handle OOV(out-of-vocabulary) words for your abbreviations detection task. For now MATLAB fastTextWordEmbedding does not handle OOV words.
There are many ways to do it, following are the two popular ones:
Vedere anche
Categorie
Scopri di più su Characters and Strings in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!