- only the substring between the outermost underscores
- all the digits
- only the capitalized letters
Converting rough strings to exact strings
1 visualizzazione (ultimi 30 giorni)
Mostra commenti meno recenti
Gabriel Stanley
il 7 Set 2022
Commentato: Gabriel Stanley
il 8 Set 2022
I have a string array of filenames which are names in an semi-consistent manner, e.g.:
AllFiles
AllFiles =
4x1 string array
"textIdontCareAbout_Phenolic32_Group5_textIdontCareAbout"
"textIdontCareAbout_P1_textIdontCareAbout"
"textIdontCareAbout_Epx2_G3_textIdontCareAbout"
"textIdontCareAbout_Epoxy_105_textIdontCareAbout"
Im trying to figure out how to extract & convert the inconsistent substrings of interest (the stuff between "textIdontCareAbout") into a consistent format, e.g.:
AllFiles
AllFiles =
4x1 string array
"P32G5"
"P1"
"E2G3"
"E105"
I had been avoiding using regexp, but having caved and decided to work with that, I'm trying to figure out an elegant way to do this conversion. At present the only thing I can see working is manually checking for each possible phrasing style I see when manualy searching through the data I have at present.
Is there a better way to go about this, or even just some suggestions to how to define the regexp in a way to have as few searches as possible?
4 Commenti
Paul
il 8 Set 2022
There may be a way to do this with string operations. Hard to tell w/o knowing the rule(s) to apply for what to keep and what to discard from a single string. For example, supposing that @the cyclist has the correct rules, one could do
AllFiles = [
"textIdontCareAbout_Phenolic32_Group5_textIdontCareAbout"
"textIdontCareAbout_P1_textIdontCareAbout"
"textIdontCareAbout_Epx2_G3_textIdontCareAbout"
"textIdontCareAbout_Epoxy_105_textIdontCareAbout"]
AllFiles = extractAfter(AllFiles,"_")
AllFiles = reverse(extractAfter(reverse(AllFiles),"_"))
upperchars = isstrprop(AllFiles,'upper')
digitchars = isstrprop(AllFiles,'digit')
AllFiles = arrayfun(@(a,b,c)(string(a{1}(find(b{:} | c{:})))),cellstr(AllFiles),upperchars,digitchars)
IDK, maybe regexp will be better/easier (I've never been able to get my head wrapped around regular expressions and patterns).
Risposta accettata
Stephen23
il 8 Set 2022
S = [...
"textIdontCareAbout_Phenolic32_Group5_textIdontCareAbout"
"textIdontCareAbout_P1_textIdontCareAbout"
"textIdontCareAbout_Epx2_G3_textIdontCareAbout"
"textIdontCareAbout_Epoxy_105_textIdontCareAbout"];
T = regexp(S,'_.+_','match','once');
T = regexprep(T,'[^A-Z\d]','')
3 Commenti
Stephen23
il 8 Set 2022
Modificato: Stephen23
il 8 Set 2022
Why are you nesting this character vector in a superfluous scalar cell array?:
SingleExpression = {'(E(poxy|px)?|P(hen(olic)?)?)(_)?\d{1,3}(_G(rp|roup)?(_)?\d{1,2})?'};
"However when trying to run a second invocation of regexp on in resulting array temp, MatLab threw the fault... The solution to the above was the have an intermediate cellstr operation on temp:"
The error occurs because you did not use the ONCE option, as shown in my answer, so your code adds an extra layer of nested cell arrays. Rather than adding extra commands (e.g. CELLSTR) the simple and efficient solution is to specify the ONCE option, just as I showed you:
temp = regexpi(AllFiles,SingleExpression,'match','once')
% ^^^^^^ simply specify this
Più risposte (0)
Vedere anche
Categorie
Scopri di più su Characters and Strings in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!