How to read text between sgml tags which are repeating?

2 visualizzazioni (ultimi 30 giorni)
I have a sgml file that has some custom tags. I used regular expression to read the data between tags but due to missing loop it is only printing the first set of values. Can some one help me writing proper loop for the same so that i get all set of values for those tags.
my sgml data looks like below. I want to read data between tags TOPICS and BODY. I also attached my matlab code.
<TOPICS><D>cocoa</D></TOPICS>
<PLACES><D>el-salvador</D><D>usa</D><D>uruguay</D></PLACES>
<PEOPLE></PEOPLE>
<ORGS></ORGS>
<EXCHANGES></EXCHANGES>
<COMPANIES></COMPANIES>
<UNKNOWN>
&#5;&#5;&#5;C T
&#22;&#22;&#1;f0704&#31;reute
u f BC-BAHIA-COCOA-REVIEW 02-26 0105</UNKNOWN>
<TEXT>&#2;
<TITLE>BAHIA COCOA REVIEW</TITLE>
<DATELINE> SALVADOR, Feb 26 - </DATELINE><BODY>Showers continued throughout the week in
the Bahia cocoa zone, alleviating the drought since early
January and improving prospects for the coming temporao,
although normal humidity levels have not been restored,
Comissaria Smith said in its weekly review.
The dry period means the temporao will be late this year.
Arrivals for the week ended February 22 were 155,221 bags
of 60 kilos making a cumulative total for the season of 5.93
mln against 5.81 at the same stage last year. Again it seems
that cocoa delivered earlier on consignment was included in the
arrivals figures.
Cake sales were registered at 785 to 995 dlrs for
March/April, 785 dlrs for May, 753 dlrs for Aug and 0.39 times
New York Dec for Oct/Dec.
Buyers were the U.S., Argentina, Uruguay and convertible
currency areas.
Liquor sales were limited with March/April selling at 2,325
and 2,380 dlrs, June/July at 2,375 dlrs and at 1.25 times New
York July, Aug/Sept at 2,400 dlrs and at 1.25 times New York
Sept and Oct/Dec at 1.25 times New York Dec, Comissaria Smith
said.
Total Bahia sales are currently estimated at 6.13 mln bags
against the 1986/87 crop and 1.06 mln bags against the 1987/88
crop.
Final figures for the period to February 28 are expected to
be published by the Brazilian Cocoa Trade Commission after
carnival which ends midday on February 27.
Reuter
&#3;</BODY></TEXT>
</REUTERS>
<REUTERS TOPICS="NO" LEWISSPLIT="TRAIN" CGISPLIT="TRAINING-SET" OLDID="5545" NEWID="2">
<DATE>26-FEB-1987 15:02:20.00</DATE>
<TOPICS></TOPICS>
<PLACES><D>usa</D></PLACES>
<PEOPLE></PEOPLE>
<ORGS></ORGS>
<EXCHANGES></EXCHANGES>
<COMPANIES></COMPANIES>
<UNKNOWN>
&#5;&#5;&#5;F Y
&#22;&#22;&#1;f0708&#31;reute
d f BC-STANDARD-OIL-<SRD>-TO 02-26 0082</UNKNOWN>
<TEXT>&#2;
<TITLE>STANDARD OIL <SRD> TO FORM FINANCIAL UNIT</TITLE>
<DATELINE> CLEVELAND, Feb 26 - </DATELINE><BODY>Standard Oil Co and BP North America
Inc said they plan to form a venture to manage the money market
borrowing and investment activities of both companies.
BP North America is a subsidiary of British Petroleum Co
Plc <BP>, which also owns a 55 pct interest in Standard Oil.
The venture will be called BP/Standard Financial Trading
and will be operated by Standard Oil under the oversight of a
joint management committee.
Reuter
&#3;</BODY></TEXT>
</REUTERS>
<REUTERS TOPICS="NO" LEWISSPLIT="TRAIN" CGISPLIT="TRAINING-SET" OLDID="5546" NEWID="3">
<DATE>26-FEB-1987 15:03:27.51</DATE>
<TOPICS></TOPICS>
<PLACES><D>usa</D></PLACES>
<PEOPLE></PEOPLE>
<ORGS></ORGS>
<EXCHANGES></EXCHANGES>
<COMPANIES></COMPANIES>
<UNKNOWN>
&#5;&#5;&#5;F A
&#22;&#22;&#1;f0714&#31;reute
d f BC-TEXAS-COMMERCE-BANCSH 02-26 0064</UNKNOWN>
<TEXT>&#2;
<TITLE>TEXAS COMMERCE BANCSHARES <TCB> FILES PLAN</TITLE>
<DATELINE> HOUSTON, Feb 26 - </DATELINE><BODY>Texas Commerce Bancshares Inc's Texas
Commerce Bank-Houston said it filed an application with the
Comptroller of the Currency in an effort to create the largest
banking network in Harris County.
The bank said the network would link 31 banks having
13.5 billion dlrs in assets and 7.5 billion dlrs in deposits.
Reuter
&#3;</BODY></TEXT>
</REUTERS>
<REUTERS TOPICS="NO" LEWISSPLIT="TRAIN" CGISPLIT="TRAINING-SET" OLDID="5547" NEWID="4">
<DATE>26-FEB-1987 15:07:13.72</DATE>
<TOPICS></TOPICS>
<PLACES><D>usa</D><D>brazil</D></PLACES>
<PEOPLE></PEOPLE>
<ORGS></ORGS>
<EXCHANGES></EXCHANGES>
<COMPANIES></COMPANIES>
<UNKNOWN>
&#5;&#5;&#5;F
&#22;&#22;&#1;f0725&#31;reute
u f BC-TALKING-POINT/BANKAME 02-26 0105</UNKNOWN>
<TEXT>&#2;
<TITLE>TALKING POINT/BANKAMERICA <BAC> EQUITY OFFER</TITLE>
<AUTHOR> by Janie Gabbett, Reuters</AUTHOR>
<DATELINE> LOS ANGELES, Feb 26 - </DATELINE><BODY>BankAmerica Corp is not under
pressure to act quickly on its proposed equity offering and
would do well to delay it because of the stock's recent poor
performance, banking analysts said.
Some analysts said they have recommended BankAmerica delay
its up to one-billion-dlr equity offering, which has yet to be
approved by the Securities and Exchange Commission.
BankAmerica stock fell this week, along with other banking
issues, on the news that Brazil has suspended intere

Risposte (1)

Walter Roberson
Walter Roberson il 29 Nov 2016
The easiest way to do this is with regexp() with named tags and the 'names' option. See https://www.mathworks.com/help/matlab/ref/regexp.html#btqhkwj-7

Categorie

Scopri di più su Financial Data in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by