downloading files from a website with conditions on names of files
    5 visualizzazioni (ultimi 30 giorni)
  
       Mostra commenti meno recenti
    
This directory has files whose filename starts with a letter "A" and "B".
The filenames in the directory are like:
A_20080403.xml 
A_20080403_1.xml
A_20080403_2.xml
A_20080404_1.xml
B_20080403_1.xml
That is
- Filenames are of the form "Capital letters"+"_"+"date"+"_"+"numbers".xml or "Capital letters"+"_"+"date".xml
 - There are dates that do not have corresponding files
 
I would like to download all the files whose filenames start with a letter "A".
What has been tried: 
(a) I was able to save a single file using "websave" command. 
(b) I have asked the question at https://www.mathworks.com/matlabcentral/answers/457470-writing-loops-to-download-files-using-matlab-websave?s_tid=srchtitle and I got a code
for k = 20080401:20100101
    filename = sprintf('A%d.xml', k);
    url = ['https://www.somecompany.com/xml/' filename];
    outfilename = websave(filename,url);
end 
Problems with the above code: The above code does not work because
- This code assumes the filename of the form "Capital letters"+"date".xml and not the filenames that explained above
 - This code returns the error for a date when there are no corresponding files and stops then
 
How shall one improve the above code?
0 Commenti
Risposte (1)
  Walter Roberson
      
      
 il 9 Feb 2022
        It would be more robust / faster if the site provided a way to list the available files, instead of having to do trial and error.
baseurl = "https://www.somecompany.com/xml/";
datelimits = datetime({'20080401', '20100101'}, 'InputFormat', 'yyyyMMdd');
subfile_limit = 5;  %no more than _5 -- adjust as appropriate
subfile_modifier = ["", "_" + (1:subfile_limit)] + ".xml";
for Day = datelimits(1):datelimits(2)
   daystr = string(Day);
    for Sub = subfile_modifier
        filename = "A_" + daystr + Sub;
        url = baseurl + filename;
        try
            outfilename = websave(filename,url);
            fprintf('fetched %s\n', filename);
        catch
            break;  %skip remaining subfiles for this date upon first failure
        end
    end
end
2 Commenti
  Walter Roberson
      
      
 il 12 Mar 2022
				datelimits = datetime({'20080401', '20100101'}, 'InputFormat', 'yyyyMMdd', 'Format', 'yyyyMMdd');
Vedere anche
Categorie
				Scopri di più su Downloads in Help Center e File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!