- Every word ends with a space
- Every line ending has a carriage return and line feed
How can I get the word count of each line from an extracted PDF file
3 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Hi, I extracted text from a PDF file with many lines/entries of comments. I want to get the word count of each line, the average word count all lines, and the number of lines that only has one word. Is this possible..? Thanks!!
0 Commenti
Risposte (1)
Kiran Felix Robert
il 2 Feb 2021
Hi Yao,
I assume that you have extracted the text from a pdf file which is saved as a string variable. You can convert the string to a character array (convertStringsToChars) and count the words and lines.
Assume that
Using the built-in MATLAB example, the following program gives you the total line count and word count in the section of the file.
str = extractFileText("exampleSonnets.pdf");
ii = strfind(str,"II");
iii = strfind(str,"III");
start = ii(1);
fin = iii(1);
stringText = extractBetween(str,start,fin-1);
B = convertStringsToChars(stringText);
% Define the space character and end-of-line character
SpaceCharacter = B(3);
CarraigeReturnCharacter = B(4);
lineCount = 0;
wordCount = 0;
i = 1;
while i <= length(B)
if B(i) == CarraigeReturnCharacter
lineCount = lineCount + 1; % Total line count
end
if B(i) == SpaceCharacter
wordCount = wordCount + 1; % Total Word Count
end
i = i + 1;
end
Kiran
0 Commenti
Vedere anche
Categorie
Scopri di più su Text Files in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!