Split function for text ( split(...) or strsplit(...))

6 visualizzazioni (ultimi 30 giorni)
Humberto Bernal
Humberto Bernal il 6 Giu 2019
Modificato: Adam Danz il 11 Giu 2019
Hello,
I have a pdf document, How do I can divide it by full stops (whole paragraphs) using split(...) or strsplit(...)?. For instance, For this text composed by these two paragraphs, I need split the text into two paragrah divided by full stop.
"
In economics the demand curve is the graphical representation of the relationship between the price and the quantity that consumers are willing to purchase. The curve shows how the price of a commodity or service changes as the quantity demanded increases. Every point on the curve is an amount of consumer demand and the corresponding market price. The graph shows the law of demand, which states that people will buy less of something if the price goes up and vice versa.
The slope of a linear demand curve is constant. The elasticity of demand changes continuously as one moves down the demand curve because the ratio of price to quantity continuously falls. At the point the demand curve intersects the y-axis PED is infinitely elastic, because the variable Q appearing in the denominator of the elasticity formula is zero there. At the point the demand curve intersects the x-axis PED is zero, because the variable P appearing in the numerator of the elasticity formula is zero there.[2] At one point on the demand curve PED is unitary elastic: PED equals one. Above the point of unitary elasticity is the elastic range of the demand curve (meaning that the elasticity is greater than one). Below is the inelastic range, in which the elasticity is less than one. The decline in elasticity as one moves down the curve is due to the falling P/Q ratio.
"
Thanks.
  6 Commenti
Humberto Bernal
Humberto Bernal il 6 Giu 2019
Yes, I have empty elements in the cell array for lines that are between paragraphs. How can I write these empty cells in the comand t = split(str, "?")?.
Thanks Adam.
Adam Danz
Adam Danz il 6 Giu 2019
See my answer below.

Accedi per commentare.

Risposte (1)

Adam Danz
Adam Danz il 6 Giu 2019
Modificato: Adam Danz il 6 Giu 2019
Try this out. I don't have your data so I'm taking a shot in the dark. It may require a small tweak.
str = extractFileText(filename);
t = split(str{:},newline);
emptyLineIdx = cellfun(@isempty,t); %find empty rows
paraGroups = cumsum(emptyLineIdx)+1; %assign paragraph group number to each line
t(emptyLineIdx) = []; %get rid of the empty lines
paraGroups(emptyLineIdx) = [];
c = splitapply(@(x){strjoin(x,'\n')},t,paraGroups) % produce cell array; one element per paragraph.
I feel like there's a more direct way to do this but this approach should also work. I wonder if there's a "new paragraph" indicator in regular expressions.
  2 Commenti
Adam Danz
Adam Danz il 7 Giu 2019
Modificato: Adam Danz il 11 Giu 2019
Have you tried this out?
Stephen23
Stephen23 il 7 Giu 2019
Humberto Bernal's "Answer" moved here:
Thanks Adam for your answer,
Yes I want two sentences,The first one has to contain the first paragraph and the second sentence have to contain the secon paragraph. The idea is that I can analyse the text by paragrahs which are diveded by full stop (.).
rng('default')
filename = "Deamand.pdf";
str = extractFileText(filename);
data = readPDFFormData(filename);
newDocuments = strsplit(str, "?");
newDocuments_1 = erasePunctuation(newDocuments);
.
.
.
.

Accedi per commentare.

Categorie

Scopri di più su Data Type Conversion in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by