Azzera filtri
Azzera filtri

With extractHTMLtext i have harvested a news article. How can I write paragraph-long blocks to a text file

2 visualizzazioni (ultimi 30 giorni)
The text analysis funcction created a clean, ASCII file out of a very complext newspaper article using the following code (which worked wel!):
url = "https://www.staradvertiser.com/2021/08/22/editorial/on-politics/on-politics-gov-david-iges-handling-of-covid-19-hobbled-by-indecision-inadequate-staffers/";
code = webread(url);
str = extractHTMLText(code)
Each paragraph became a line of text. How can I write these to an ascii file for import to a text processing program? One paragraph per line of output file (txt or xlsx) would be best.

Risposte (1)

Vatsal
Vatsal il 21 Feb 2024
Hi,
To output the extracted text to an ASCII file, formatting each paragraph as a separate line, the text must first be divided into paragraphs. This can be achieved in MATLAB by utilizing the "split" function, which divides a string into a cell array of strings using designated delimiters.
Here is the modified code to write each paragraph to a text file:
url = "https://www.staradvertiser.com/2021/08/22/editorial/on-politics/on-politics-gov-david-iges-handling-of-covid-19-hobbled-by-indecision-inadequate-staffers/";
code = webread(url);
str = extractHTMLText(code)
str_split = split(str, '\n'); % Split the string into paragraphs
fileID = fopen('output.txt','w'); % Open a file named 'output.txt'. Change it as per your requirement.
for i = 1:numel(str_split)
fprintf(fileID,'%s\n',str_split{i}); % Write each paragraph on a new line
end
fclose(fileID); % Don't forget to close the file after you're done
I hope this helps!

Categorie

Scopri di più su Environment and Settings in Help Center e File Exchange

Prodotti


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by