Replacing only certain instances of text within matlab character array
3 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
I have a large character array in matlab: 'lineDataA' - containing many different numbers.
I would like to find and replace all instances of the number '6002' and replace with '0', apart from the very first instance.
lineData = replace(lineDataA, '6002', '0');
This replaces all instances
And
where6002 = strfind(lineDataA, '6002');
Gives the position of all the instances. However I am not sure how to replaces all the instances except the first?
Many thanks for your help,
Rob
0 Commenti
Risposta accettata
Stephen23
il 20 Gen 2017
Modificato: Stephen23
il 20 Gen 2017
Method One: split the string
>> str = '___6002__6002___6002___6002__';
>> idx = regexp(str,'6002','once','end');
>> strcat(str(1:idx),strrep(str(idx+1:end),'6002','0'))
ans =
___6002__0___0___0__
Method Two: use a placeholder
>> str = '___6002__6002___6002___6002__';
>> str = regexprep(str,'6002','\b','once');
>> str = strrep(str,'6002','0');
>> regexprep(str,'\b','6002')
ans =
___6002__0___0___0__
Note that the original string must not contain \b.
Method Three: dynamic regular expression
>> str = '___6002__6002___6002___6002__';
>> regexprep(str,'(.*?6002)(.*)','$1${strrep($2,''6002'',''0'')}')
ans =
___6002__0___0___0__
2 Commenti
John Leal
il 16 Ott 2017
I have a similar problem. I need to replace some words for others in an extense array. I have the code but is too slow. Can you help me to find a way to make it better?:
if true
% code
textData = regexprep(textData, '[@$/#.-:-&*+=[]?!(){},''">_<;%]|', ' ');
% Remove any non alphanumeric characters
textData = regexprep(textData, '[^a-zA-Zñ ]', '');
textData = regexprep(textData, '[0-9]+', ' ');
textData = regexprep(textData, '<[^<>]+>', ' ');
textData = regexprep(textData, 'á', 'a');
textData = regexprep(textData, 'é', 'e');
textData = regexprep(textData, 'í', 'i');
textData = regexprep(textData, 'ó', 'o');
textData = regexprep(textData, 'ú', 'u');
textData = regexprep(textData, 'ñ', 'n');
textData = regexprep(textData, 'x', 's');
textData = regexprep(textData, 'cc', 'c');
textData = regexprep(textData, 'ci', 'si');
% deletedWords = ["helllo","hello";"moter","mother"] ... 50000 rows
% excludedWords = ["father","three", "tree"]... words I don't want to replace
% textData = ["my mother lives with my father";"hello Word"]... 2 million rows.
m = length(deletedWords(:,1));
for idx=1:m
w_new = deletedWords{idx,1};
w_ok = deletedWords{idx,2};
f = find(excludedWords==w_new, 1);
% only if it is not in excludesWords
if isempty(f)
% Replace EXACT word match"
textData = regexprep(textData,"(?<![\w])"+w_new+"(?![\w])" ,w_ok );
end
end
end
John Leal
il 16 Ott 2017
The main idea is to correct misspelling words in SPANISH. It is like a handmade stem adjust to my specific data. deletedWords contains the misspelling word and the correct word. These words are extracted from the same textData using jaro wrinkler to convert less frequent word to a high frequent word with more than 95% similarity.
Ty
Più risposte (0)
Vedere anche
Categorie
Scopri di più su Environment and Settings in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!