removeInfrequentWords

Remove words with low counts from bag-of-words model

Description

example

newBag = removeInfrequentWords(bag,count) removes the words that appear at most count times in total from the bag-of-words model bag.

Examples

collapse all

Remove the words that appear two times or fewer from a bag-of-words model.

Create a bag-of-words model from an array of tokenized documents.

documents = tokenizedDocument([
    "an example of a short sentence"
    "a second short sentence"
    "another example"
    "a short example"]);
bag = bagOfWords(documents)
bag = 
  bagOfWords with properties:

          Counts: [4x8 double]
      Vocabulary: [1x8 string]
        NumWords: 8
    NumDocuments: 4

Remove the words that appear two times or fewer from the bag-of-words model.

count = 2;
newBag = removeInfrequentWords(bag,count)
newBag = 
  bagOfWords with properties:

          Counts: [4x3 double]
      Vocabulary: ["example"    "a"    "short"]
        NumWords: 3
    NumDocuments: 4

Input Arguments

collapse all

Input bag-of-words model, specified as a bagOfWords object.

Count threshold to remove words, specified as a positive integer. The function removes the words that appear count times in total or fewer.

Introduced in R2017b