Using regexp to extract labels
2 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Dhani Dharmaprani
il 27 Dic 2016
Modificato: Stephen23
il 16 Gen 2017
Hi,
I have a file (attached) which includes some header information I am interested in. Specifically, I would like to extract the various channel labels, but can't quite correct the expression in order to obtain all of the label names in the file and nothing else. I have tried
labels = regexp(filetext, '((?<=Label:)(\s*\w*){2}\D*\d*)+', 'match');
although this doesn't quite work due to the first two channel labels being in a different format to the rest. If anyone can offer advice so that I can fix my expression to work for the first two channel labels also, that would be greatly appreciated.
Thank you kindly in advance!
0 Commenti
Risposta accettata
Stephen23
il 27 Dic 2016
Modificato: Stephen23
il 27 Dic 2016
There is no need to be so specific about the format of the data field. It would be enough to identify the 'Label:' and anything that is not a newline, something like this:
>> str = fileread('test.txt');
>> C = regexp(str,'(?<=Label:\s*)[^\n]*','match');
>> C{:}
ans = I
ans = II
ans = A 1-2
ans = A 3-4
ans = A5-6
ans = A 7-8
ans = B 1-2
ans = B 3-4
ans = B 5-6
ans = B 7-8
ans = C 1-2
ans = C 3-4
ans = C 5-6
ans = C 7-8
ans = D 1-2
ans = D 3-4
ans = D 5-6
ans = D 7-8
ans = CS 5-6
ans = E 1-2
ans = E 3-4
ans = E 5-6
ans = E 7-8
ans = F 1-2
ans = F 3-4
ans = F 5-6
ans = F 7-8
ans = G 1-2
ans = G 3-4
ans = G 5-6
ans = G 7-8
ans = H 1-2
ans = H 3-4
... etc
Or if you want to match that format, something like this:
>> C = regexp(str,'(?<=Label: *)[A-Z]+[\-\d ]*','match'); C{:}
ans = I
ans = II
ans = A 1-2
ans = A 3-4
ans = A5-6
ans = A 7-8
ans = B 1-2
ans = B 3-4
ans = B 5-6
ans = B 7-8
ans = C 1-2
ans = C 3-4
ans = C 5-6
...etc
If you want to experiment with regular expressions, then try my FEX submission that lets you quickly change and test regular expressions in an interactive figure:
3 Commenti
Stephen23
il 16 Gen 2017
Modificato: Stephen23
il 16 Gen 2017
To optimize you could start by getting rid of the lookaround operation: these are always slow.
C = regexp(str,'(Label: *)([A-Z]+[\-\d ]*)','tokens');
vertcat(C{:})
But the most efficient solution is likely to avoid regexp entirely and use one of the file import tools (such as textscan) to read the file data into cell arrays, and then use a fast strcmp to locate the desired rows:
opt = {'HeaderLines',1, 'Delimiter','\n'};
fid = fopen('test.txt','rt');
C = textscan(fid,'%[^:]: %[^\n]', opt{:});
fclose(fid);
data = C{1}{end};
C = horzcat(C{1}(1:end-1),C{2});
idx = strcmpi('Label',C(:,1));
out = C(idx,2)
create this output:
out =
'I'
'II'
'A 1-2'
'A 3-4'
'A5-6'
'A 7-8'
'B 1-2'
'B 3-4'
'B 5-6'
'B 7-8'
'C 1-2'
'C 3-4'
'C 5-6'
'C 7-8'
'D 1-2'
'D 3-4'
'D 5-6'
'D 7-8'
'CS 5-6'
'E 1-2'
'E 3-4'
'E 5-6'
'E 7-8'
'F 1-2'
'F 3-4'
'F 5-6'
'F 7-8'
'G 1-2'
'G 3-4'
'G 5-6'
'G 7-8'
'H 1-2'
'H 3-4'
'H 5-6'
'H 7-8'
Più risposte (1)
José-Luis
il 27 Dic 2016
filetext = fileread('test.txt');
expr = '(?<=Label:\s+)([\w-\s]+)(?=\n)';
hits = regexp(filetext, expr, 'match')
0 Commenti
Vedere anche
Categorie
Scopri di più su Characters and Strings in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!