Azzera filtri
Azzera filtri

How should I fix my regular expression to parse this txt file?

4 visualizzazioni (ultimi 30 giorni)
This is part of my code that reads the text file I attached and searches the file name between 'subsystems.tbl\' and '.sub' according to the given 'sub_sys (Major Role)' and 'location (Minor Role)' using regular expressions.
if ismember(sub_sys, {'spr', 'dpr', 'bum', 'reb'})
block_pattern = ['\/([^\/]+)\.', sub_sys];
elseif ismember(sub_sys, 'susp')
block_pattern = ['\$[-]+\s*SUBSYSTEM[\s\S]*?Major Role : suspension','[\s\S]*?Minor Role : ', location, '[\s\S]*?USAGE\s*=\s*''<[^>]+>/subsystems.tbl/([^'']+)\.sub'''];
elseif ismember(sub_sys, {'steering', 'wheel'})
block_pattern = ['\$[-]+\s*SUBSYSTEM[\s\S]*?Major Role : ', sub_sys, '[\s\S]*?Minor Role : ', location, '[\s\S]*?USAGE\s*=\s*''<[^>]+>/subsystems.tbl/([^'']+)\.sub'''];
elseif ismember(sub_sys, 'tir')
block_pattern = ['PROPERTY_FILE\s*=\s*''[^'']+\/([^\/]+)\.tir'''];
end
name_tokens = regexp(file_content, block_pattern, 'tokens', 'once', 'dotexceptnewline');
it reads well for the front suspension system (susp, spr, dpr, bum, reb, steering, wheel, tir) and returns the correct paths, but for rear suspension system, my code reads rr_susp_path = 'AA_TCAR_WHEEL_RR_22inch' instead of giving me rr_susp_path = 'AA_TCAR_SUSP_RR_RWS_230607'
It seems that my regular expression is way too broad and causing this problem. How should I fix my regular expression?

Risposta accettata

Stephen23
Stephen23 il 18 Apr 2024
Modificato: Stephen23 il 18 Apr 2024
"It seems that my regular expression is way too broad and causing this problem."
There are several locations where your regular expression matches unlimited amounts of (almost) anything:
  • [^'']+
  • [^>]+
  • [\s\S]*
I doubt that you really want unlimited matches like that.
"How should I fix my regular expression?"
Perhaps something like this:
pf1 = 'suspension';
pf2 = 'rear';
tmp = strcat('\$\s+',{'Major';'Minor'},'\s+Role\s+:\s+',{pf1;pf2},'\s+');
rgx = ['(?<=',tmp{:},'(\$.+\s+)*USAGE\s+=.+?)\w+\.sub']
rgx = '(?<=\$\s+Major\s+Role\s+:\s+suspension\s+\$\s+Minor\s+Role\s+:\s+rear\s+(\$.+\s+)*USAGE\s+=.+?)\w+\.sub'
str = fileread('test_example.txt');
out = regexp(str,rgx,'match','once','dotexceptnewline')
out = 'AA_TCAR_SUSP_RR_RWS_230607.sub'
  1 Commento
Munho Noh
Munho Noh il 19 Apr 2024
Hello Steven, your answer is always helpful, thank you always.
I modified your answer a little bit like the following to capture only the file name except for the .sub extension.
block_pattern = ['(?<=\$\s+Major\s+Role\s+:\s+', sub_sys, '\s+\$\s+Minor\s+Role\s+:\s+', location, '\s+(\$.+\s+)*USAGE\s+=.+\/)(\w+)(?=\.sub)'];
Thank you for your good advice.

Accedi per commentare.

Più risposte (0)

Categorie

Scopri di più su Time Series Events in Help Center e File Exchange

Prodotti


Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by