Reading/fetching text from text/PDF file for pre-processing
Mostra commenti meno recenti
I have text/pdf files which contains millions of words(text). If i use str = extractFileText(filename) then firstly matlab became very slow also some time hancked . Also variable is not able to hold such a large data.
I want to read file word by word so i can filter text and make a smaller array of filtered data. Or i want to make filtered data temp file for next processing of data(as t will be small).
i need help in this also if you have any other solution of my probelm do reply.
2 Commenti
Did you try using the function with name, value pair?
for i = 1:numel(pages)
str = extractFileText(filename, 'pages', pages(i)); % get only one page per time
% do whatever you want with str
end
moin khan
il 21 Mar 2021
Risposta accettata
Più risposte (0)
Categorie
Scopri di più su Text Data Preparation in Centro assistenza e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!