Extract number from file name

There are several files like this: K10_0.0.json, Mig_Thresh_2.0.json, K_5_6.5.json, WC_0.00051.json, ... and I need to extract the number after the last underline which would be 0.0 for K10_0.0.json, 2.0 for Mig_Thresh_2.0.json, 6.5 for K_5_6.5.json and 0.00051 for WC_0.00051.json. In other words, I need to get the number after the last underline (_).
Any idea how to do that?

 Risposta accettata

David Hill
David Hill il 4 Nov 2019
Look at regexp function. I assume your input is a string array and you want a string array output.
a='K_5_6.5.json';
b=regexp(a,'(?<=[_])\d*[.]?\d*','match');
c=cell2mat(b(end));%c='6.5'

5 Commenti

I would replace the first \d* by \d+.
Note that
cell2mat(b(end))
is simply
b{end}
Conversion to actually number would be:
str2double(b{end})
Zeynab Mousavikhamene
Zeynab Mousavikhamene il 4 Nov 2019
Modificato: Zeynab Mousavikhamene il 4 Nov 2019
Thanks David and Guillaume. Can you please explain what does each '(?<=[_])\d*[.]?\d*' mean? I know what * mean.
If you look at the regexp function description in mathworks it explains it quite well. The (?<=[_]) looks after each '_' the \d* looks for 0 or more consecutive digits, \d+ looks for 1 or more consecutive digits, [.]? Recommend changing to:
a='K_5_6.5.json';
b=regexp(a,'(?<=[_])\d+[.]?\d*','match');
c=b{end};%c='6.5'
Guillaume
Guillaume il 4 Nov 2019
Modificato: Guillaume il 4 Nov 2019
  • (?<=xxx) is a look behind expression. Here it means that the match must be preceded by xxx
  • the xxx here is [_], [] is used to specify a group of characters to match. I'm not sure why David used that, it's not needed since there's only one character, the _.
  • \d matches any digit (so characters '0' to '9'.
  • [.] matches a dot. Again, the [] is unnecessary, however . when not inside brackets must be escaped with \., the ? makes it optional
So, a clearer expression would be:
regexp(a,'(?<=_)\d+\.?\d*','match') %match must follow _ and is made from 1 or mode digits followed by an optional . and more digits
Note that all the above is explained in the documentation of regexp
You could also use captures instead of look behind:
regexp(a, '_(\d+\.?\d*)', 'tokens')
Stephen23
Stephen23 il 5 Nov 2019
Modificato: Stephen23 il 5 Nov 2019
Note that this answer does not do what the question requested. The question clearly states "I need to get the number after the last underline", but this answer gets the number after every underline, thus it clearly fails your "6.5 for K_5_6.5.json" example by returning both numbers:
>> a = 'K_5_6.5.json';
>> regexp(a,'(?<=[_])\d*[.]?\d*','match')
ans =
'5' '6.5'
Guillaume improved the regular expression, but did not change this behavior.

Accedi per commentare.

Più risposte (1)

Stephen23
Stephen23 il 4 Nov 2019
>> C = {'K10_0.0.json', 'Mig_Thresh_2.0.json', 'K_5_6.5.json', 'WC_0.00051.json'};
>> [~,N] = cellfun(@fileparts,C,'uni',0);
>> D = regexp(N,'\d+\.?\d*$','match','once');
>> V = str2double(D)
V =
0.00000 2.00000 6.50000 0.00051

Categorie

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by