Regexp expression to handle changing format
2 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
%dummy data
% t,00000000CIB0000004001,0.47,L,000 00:00:00.00,343 19:54:20.684 8,22.501
% t,00000000CIB0000004001,0.47,L,000 00 00:00:00.00,21 343 19:54:20.684 8,22.501
S=fileread(filename);
myexpression = ['(?<tvar>w*,'...
'(?<tmCodeRdr>\w*),'...
'(?<tmCodLvl>\w*\.*\w*),'...
'(?<HNL>\w*),'...
'(?<codeTm>\w*\s*\d*\:*\d*\:*\d*\.*\d*,'... % <== This line handles the first line of dummy data
'(?<caprTm>\w*\s*\d*\:*\d*\:*\d*\.*\d*\s*\d*,'... % <== This line handles the first line of dummy data
'(?<logAt>\w*\.*\w*']
parts = regexp(filtered,myexpression,'names')
The third and second to last variables (codeTm, caprTm) change formats within the data. How can I modify or add logic to accept 2 to 3 spaced values within the variable "codeTm" and 3 to 4 spaced values within variable "caprTm"???
2 spaced valued variable (000 00:00:00.00)
3 spaced valued variable (000 00 00:00:00.00) or (343 19:54:20.684 8)
4 spaced valued variable (21 343 19:54:20.684 8)
Thank you for the help. My apologies for making my expresion so complicated. Still learning the in's and out's for expression formats for regexp to read data.
2 Commenti
Stephen23
il 7 Mar 2022
It is not clear why you are using regular expressions for importing this data: READTABLE et al have options for handling missing field data. Having you considered using the inbuilt data importing functions?
Risposte (1)
Stephen23
il 7 Mar 2022
Modificato: Stephen23
il 7 Mar 2022
You can easily make a group optional or occur a specific number of times using any suitable quantifier, for example:
(..)? % zero or one time
(..)* % zero or more times
(..){2,4} % two to four times
etc.
However, rather than trying to match specific groups of characters I would use a simpler approach of matching sets of characters. I had to fix several other bugs in your regular expression to get this working, mostly missing backslashes and parentheses.
str = fileread('test.txt')
rgx = ['^\s*(?<tvar>\w*),'...
'(?<tmCodeRdr>\w*),'...
'(?<tmCodLvl>\d*\.?\d*),'...
'(?<HNL>\w*),'...
'(?<codeTm>[ :\w\.]*),'...
'(?<caprTm>[ :\w\.]*),'...
'(?<logAt>\d*\.?\d*)'];
parts = regexp(str,rgx,'names','lineanchors')
parts.codeTm
But personally I would not try and reinvent the wheel for such a data file, READTABLE is much simpler:
tbl = readtable('test.txt','delimiter',',');
tbl.Properties.VariableNames = {'tvar','tmCodeRdr','tmCodLv','HNL','codeTm','caprTm','logAt'}
Vedere anche
Categorie
Scopri di più su Characters and Strings in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!