Fastest way to search files by pattern name

59 visualizzazioni (ultimi 30 giorni)
I have a main folder with a lot of subfolders (thousands). I want to load files from only specific subfolders, that can be found by specific pattern in the subfolder name. Then, in each of the subfolders, there are tens of sub-subfolders, where I also have to go to only specific ones, which again can be found by a pattern in the name. To extract needed files, I have implemented two ways of doing this via dir function: 1) one line, just using the whole path with subfolders and sub-subfolders; 2) firstly, searching for all subfolders and then searching for sub-subfolders in a for loop over the subfolders. Turns out, that the latter is much faster. Could you explain why?
%first way
files = dir(fullfile(main_folder,'*_data/*_file_to_load/file1.mat'));
%second way
subfolders = dir(fullfile(main_folder,'*_data/');
files = cell(1,numel(subfolders));
for i = 1:numel(subfolders)
files{i} = dir(fullfile(subfolders(i).folder,subfolders(i).name,'*_file_to_load/file1.mat'));
end
  6 Commenti
Anton Baranikov
Anton Baranikov il 16 Apr 2023
@Rik, yes, I got it. However, timings are counted properly (semicolons are present)
Image Analyst
Image Analyst il 16 Apr 2023
@Anton Baranikov did you overlook the Answer below in the official Answer section of the page? Did you only see the comments up here at the top where people are not giving answers but are asking for clarification of the question? If you saw my Answer below, then explain why it doesn't work, or let me know that it did work.

Accedi per commentare.

Risposta accettata

dpb
dpb il 17 Apr 2023
Modificato: dpb il 17 Apr 2023
As far as the original Q?, it's owing to how the underlying OS processes the dir command -- when you ask for a directory listing of a chain of subdirectories from a higher level, those aren't necessarily stored in sequence on disk in the pattern in which they appear so the dir command has to traverse the whole directory structure from the top until it gets all the way to the bottom; it also doesn't know where the match may stop so it has to do everything possibly reacheable from the very topmost location.
In the second case, you're giving it the starting point underneath the specific folder and that chain to the bottom is undoubtedly only one level deep. It's just not doing nearly as much work in the second case as must do in the first.
The fastest way will be to limit the search to as shallow a depth search as your a priori knowledge of the structure can make it. More shallow searches will virtually always beat one deep one.
  2 Commenti
Anton Baranikov
Anton Baranikov il 17 Apr 2023
Perfect, that is exactly, what I wanted to know!
dpb
dpb il 17 Apr 2023
You'll trade some coding complexity/thinking about the actual data structure for better performance this way. The one time investment may well pay off in the long run if it's a case that will occur often; particularly if you can also automate the generation of the order structure programmatically.

Accedi per commentare.

Più risposte (1)

Image Analyst
Image Analyst il 16 Apr 2023
Use contains to see if the pattern is in the folder or file name. Process the ones you want, and skip the ones you don't want by calling continue
if contains(thisSubFolderName, 'patternIDoNotWant')
continue % Skip to bottom of for loop
end
  4 Commenti
dpb
dpb il 17 Apr 2023
Modificato: dpb il 17 Apr 2023
"...or you could try using ismember"
Actually, contains (and friends) work same...
if contains(thisSubFolderName, 'patternIWant1') || contains(thisSubFolderName, 'patternIWant3') || contains(thisSubFolderName, 'patternIWant3')
could be written as
if contains(thisSubFolderName, {'patternIWant1','patternIWant2','patternIWant3'})
Have to be careful with contains however, that it is the comparison wanted because it matches any substring within the searched string.

Accedi per commentare.

Categorie

Scopri di più su File Operations in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by