Count the number of times a word begins with "co" in a text using Text Analytics Toolbox
1 view (last 30 days)
I have a pdf with news headlines, and I need to count the number of words each title has and the number of times the words starting with "co" and the word "price" appear in each title. I have not much experience using the Text Analytics Toolbox in Matlab. As far as I can see, "tokenizedDocument" already gives you the total number of words (or tokens) per headline, and "context" counts a specific word. However, I do not know how to ask Matlab to look for words starting with "co". Also, how do I get this information displayed in a table?
I leave my pdf and my code.
I really appreciate any help you can provide!
filename = "Factiva_sample_headlines_1.pdf";
str = extractFileText(filename);
textData = split(str,[newline newline]); %split the text into separate news using split
textData = textData(cellfun(@(s)isempty(regexp(s,'Page')),textData)); %Erase data related to number of page
cleanedDocuments = tokenizedDocument(textData); %Create an array of tokenized documents.
Jonas on 21 Apr 2022
Edited: Jonas on 21 Apr 2022
are your searching for something like in this example, applied to your textData?
1×5 logical array
1 0 1 0 1
you can sum that array to get the total number of words starting with "co"