How to read particular dates from an HTML script?

1 visualizzazione (ultimi 30 giorni)
I have downloaded HTML script of of few webpages; one is uploaded for reference. My task is to extract some information from the this script. On line 851 latitude and longitude are given which I extracted using the following code:
filename=strcat(pwd,'/',num2str(site(i))); % file to be read, which is same as the file uploaded
fileID=fopen(filename); % fileID
open_file=textscan(fileID,'%s','%f'); % parsing the file
open_file=open_file{1,1};
lat_id=find(ismember(open_file,... % Finding the position of Latitude in text-file
'<dd>Latitude'));
long_id=find(ismember(open_file,... % Finding the position of Longitude in text-file
'Longitude'));
lat(i)=open_file(lat_id+1); % latitude
long(i)=open_file(long_id+1); % longitude
proj=open_file(long_id+3); % projection type, e.g., NAD27, NAD83
But I am not able to use similar code for reading the data in line 865, which contains the time-range of the some data. The problem is that the variable open_file do not seem to contain these values. Any suggestions will be helpful.

Risposta accettata

Walter Roberson
Walter Roberson il 29 Gen 2018
filename=strcat(pwd,'/',num2str(site(i))); % file to be read, which is same as the file uploaded
S = fileread(filename);
place_info = regexp(S, 'Latitude\s+(?<lat>[^ ,]+),\s*\S+\s*Longitude\s+(?<long>\S+)\s*\S+\s*(?<proj>\w+)', 'names', 'once');
periods_info = regexp(S, '''begin_date''[^\d]*(?<begin_date>\d+-\d+(-\d+)?).*?end_date[^\d]*(?<end_date>\d+-\d+(-\d+)?).*?sites_selection_links\W*(?<stats_type>[^<]+)', 'names');
other_info = regexp(S, 'site_no=\d+">(?<stats_type>.*?)</a>.*?''begin_date''[^d]*?(?<begin_date>\d+(-\d+(-\d+)?)?).*?end_date[^\d]*?(?<end_date>\d+(-\d+(-\d+)?)?)', 'names');
combined_info = [periods_info, other_info];
Now:
place_info is a struct with fields 'lat', 'long', and 'proj' reflecting latitude, longitude, and projection. The lat and long are in the form they were stored in the file, so they may have a ° in them, corresponding to the ° symbol.
combined_info is a struct with fields begin_date, end_date, and stats_type . stats_type is the information about what the period is describing. In the sample data file those are
'Daily Statistics' 'Monthly Statistics' 'Annual Statistics' 'Current / Historical Observations' 'Peak streamflow' 'Field measurements' 'Field/Lab water-quality samples' and 'Water-Year Summary'
  1 Commento
Abhinav
Abhinav il 29 Gen 2018
Modificato: Abhinav il 29 Gen 2018
Thanks a lot! I did not know that these routines exist in MATLAB.

Accedi per commentare.

Più risposte (0)

Prodotti

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by