Load the example data. The file sonnetsPreprocessed.txt contains preprocessed versions of Shakespeare's sonnets. The file contains one sonnet per line, with words separated by a space. Extract the text from sonnetsPreprocessed.txt, split the text into documents at newline characters, and then tokenize the documents.
N-gram lengths, specified as a positive integer or a vector of positive
integers.
If you specify lengths, the function removes
infrequent n-grams of the specified lengths only. If you do not specify
lengths, then the function removes infrequent
n-grams regardless of length.
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.