regexp to filter file names

2 visualizzazioni (ultimi 30 giorni)
chlor thanks
chlor thanks il 5 Lug 2016
Commentato: Image Analyst il 5 Lug 2016
I have files such as the following:
s =
'HI_B2_TTTT9_Default452_07052016.xlsx'
'HI_H2G_TTTT7_Default259_070516.xlsx'
'HI_B2C_TTTT9_Default1482_070516.xlsx'
'HI_A1C_TTTT4_468_070516.xlsx'
'HI_G1C_TTTT8_862_07052016.xlsx'
'HI_KA6_TTTT4_148_07052016.xlsx'
'HI_8C_TTTT7_279_Potato_07052016.xlsx'
I only wish to process the first six files and filter out the last one which is a different format than the first six files. Note that even though some of them did not say "Default" in the file names, it is still considered default since it did not specifically mention "Potato" or other keywords.
I try not to filter it out by keywords "Potato" since there may be future files add in this cell array that contains other keywords such as "Carrot", "Bacon", etc (I don't know what they will be yet) other than "Potato". In that case, they will not be filtered out as I wish they would.
Actually I think I figure out the code after looking at your answers?
I used find(cell2mat(regexp(s,'HI_\w+_\TTTT\d_(Default)?\d+_\d+')))
Thank y'all for all the inspiration!!

Risposta accettata

Azzi Abdelmalek
Azzi Abdelmalek il 5 Lug 2016
s={'HI_A1C_TTTT4_468_07052016.xlsx'
'HI_B2_TTTT9_Default452_070516.xlsx'
'HI_GA1C_TTTT8_862_07052016.xlsx'
'HI_HB2C_TTTT7_Default259_070516.xlsx'
'HI_KA6_TTTT4_148_07052016.xlsx'
'HI_B2C_TTTT9_Default1482_070516.xlsx'
'HI_8C_TTTT7_279_Potato.xlsx'}
out=regexp(s,'\w+_\w+_\w+_(Default)?\d+_\d+','match','once')

Più risposte (1)

Image Analyst
Image Analyst il 5 Lug 2016
What's unique about the filenames you want to keep? Do they all end in 16 like in your small sample? If so do
fileStruct = dir('*16.xlsx');
Now, just use fileStruct(k).name in your loop or wherever you need to reference the filename.
  2 Commenti
chlor thanks
chlor thanks il 5 Lug 2016
Thank you for providing another insight to do this!
However, it will not work very well in my particular case (I fixed this particular little bug in my updated question...I made the question up so that I can rewrite the code later by myself.)
The filenames are unique taking the example of 'HI_A1C_TTTT4_468_07052016.xlsx':
HI may stands for a particular program name
A1C may stands for a particular operation within it
TTTT4 stands for who performed this operation
468 stands for the task number
07052016 stands for the date the file is made (you will notice that sometimes it is 070516 and sometimes it is 07052016 depends on how the person feel when they save the file...)
So the purpose of this regexp is to extract these files out of hundreds of other files that I have and I will later parsing these info using "split", but that's a different story...
Image Analyst
Image Analyst il 5 Lug 2016
OK, though I'm still not sure what constitutes a good filename and a bad one. If it's just the presence of some list of keywords defined in advance, you might look at ismember to identify what strings, in a cell array of filenames, have any of the keywords in them.

Accedi per commentare.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by