Compare two string arrays with a if loop

2 visualizzazioni (ultimi 30 giorni)
Tatjana Mü
Tatjana Mü il 19 Mag 2022
Modificato: Jan il 19 Mag 2022
Hi,
now it gets complicated. And I reach my limits with Matlab. I have a xlsx file with compositions of particles (SMP_3_16_3.xslx) and a databank of minerals (Datenbank_Minerale.xlsx). These are just test files. Later this files will have more than 30.000 lines.
So the idea is to import both files (which is working fine). I change some formations. Then I want to compare the composition of the particle (column 1 of string array: iwant), with the databank compositions (column 1 of Datenbank_Minerale). If they are equal, the mineral names (saved in column 2-x) should be written next to the composition in the sample file (iwant). Until now, we are not sure how many columns the databank will have, means we dont know how many minerals fit a composition (it will be different from line to line). To my code:
I import the data:
%--------------------Import of sample files--------------------------------
directory_name=uigetdir('','Ordner mit Messungen auswählen');
[nur_file_name,pfad]=uigetfile({'*.xlsx','xlsx (*.xslx)';'*.*','all Files'},...
'Die xslx-Files der Proben oeffnen (probe_001.xlsx=',[directory_name '/'], 'Multiselect', 'on');
nur_file_name=cellstr(nur_file_name);
nur_file_name=sort(nur_file_name);
filename=strcat(pfad,nur_file_name);
anzahl_files=size(filename,2);
for xy=1:anzahl_files
fid_in=fopen(char(filename(xy)),'r');
%------------------------Get the data from the files----------------------
tmpImport = importdata(filename{xy},',');
SP_RL = tmpImport.Tabelle1;
SP_RL = regexprep(SP_RL,'\''','');
SP_RL = regexprep(SP_RL,'\+','');
SP_RL = string(SP_RL);
[anzahl_zeile,anzahl_elemente]=size(SP_RL);
%----------------Delete numbers and duplicate of elements-----------------
SP_RL= regexprep(SP_RL, '\d+(?:_(?=\d))?', '');
for x=1:anzahl_zeile
iwant(x,:) = join(unique(split(string(SP_RL(x,:)))),',');
end
iwant=eraseBetween(iwant,1,1);
[anzahl_Reihen,anzahl_Spalten]=size(iwant);
%-----------------------Import databank---------------------------------
directory_name=uigetdir('','Ordner mit Messungen auswählen');
[nur_file_name,pfad]=uigetfile({'*.xlsx','xlsx (*.xlsx)';'*.*','all Files'},...
'Das xlxs-File der Proben oeffnen (probe_001.xlsx=',[directory_name '/']);
RLImport = importdata('Datenbank_Minerale.xlsx');
Datenbank=string(RLImport);
[anzahl_lines,anzahl_columns]=size(Datenbank);
This part is working fine. Now I want to compare it and save the mineral names in the sample file but I reach my limit how to tell matlab that I want to have the mineral names in the sample file.
for i=1:anzahl_Reihen
if strcmp(Datenbank,iwant(i,:));
iwant(i,2:anzahl_columns)= ???
end
end
I hope somebody can help me, because this is so complicated.
Thank you really much
  1 Commento
Jan
Jan il 19 Mag 2022
Modificato: Jan il 19 Mag 2022
@Tatjana Mü: Simplify your view on the problem:
"So the idea is to import both files (which is working fine)" - than you can omit this part in the description, which mentions the Excel files. Start with some already imported data: "I have a cell array/table containing..."
"Later this files will have more than 30.000 lines." - then they are still small.
This is fragile:
filename=strcat(pfad,nur_file_name);
Prefer fullfile instead of strcat.
Instead of casting a scalar cell string to a CHAR, simply access the contents of the cell string:
% fid_in=fopen(char(filename(xy)),'r');
fid_in=fopen(filename{xy}, 'r');
You do this some linaes later, but fid_in is neither used nor closed. Simply omit the fopen line.
The text of the question until "This part is working fine" is not useful to solve your problem, but typing this has wasted your time only - and the time of the readers. In opposite to this, the explanation of the actual problem is really lean: "compare it and save the mineral names in the sample file".
Your "Datenbank" and "iwant" are string matrices or vectors? Then start with some Matlab code, which creates a tiny but relevant example and post a manually construted output.

Accedi per commentare.

Risposta accettata

Jan
Jan il 19 Mag 2022
Modificato: Jan il 19 Mag 2022
A simpler formulation of your question might be:
Datenbank = [ ...
"Na,Pb" "Abellaite" ""; ...
"Ni" "Abelsonite" ""; ...
"Ce,Na,P,S" "Abenakiite(-Ce)" ""; ...
"As,K,U" "Abhurite" ""; ...
"K,Rb,Si,Sr" "Test1" ""; ...
"Kr,Os,Pt,Rb,Sr" "Test2" "Test3"];
iwant = [ ...
"Ge,Hg,Na,Pt,Si"; ...
"Na,Pb"; ...
"Ir,Na"; ...
"Ca,Mg"; ...
"Mg"; ...
"Cr,Hg,Mg,Mo,Pt,Ru,Si"; ...
"Si"; ...
"As,K,U"; ...
"Cu,Mo,P,Ru"; ...
"As,Bi,Ca,Cd,Cr,Er,Hf,Lu,Nb,Ni,Pd,Pt,Sb,Sn,Sr,Ta,Te,Ti,Tm,V,W,Xe,Y,Yb,Zn,Zr"];
"If they are equal, the mineral names (saved in column 2-x) should be written next to the composition in the sample file (iwant)"
[match, index] = ismember(iwant(:, 1), Datenbank(:, 1));
iwant(:, 2) = ""; % Insert empty strings or leave it <missing>
iwant(match, 2) = Datenbank(index(match), 2)
iwant = 10×2 string array
"Ge,Hg,Na,Pt,Si" "" "Na,Pb" "Abellaite" "Ir,Na" "" "Ca,Mg" "" "Mg" "" "Cr,Hg,Mg,Mo,Pt,Ru,Si" "" "Si" "" "As,K,U" "Abhurite" "Cu,Mo,P,Ru" "" "As,Bi,Ca,Cd,Cr,Er,Hf,Lu,Nb,Ni,Pd,Pt,Sb,Sn,Sr,Ta,Te,Ti,Tm,V,W,Xe,Y,Yb,Zn,Zr" ""
  2 Commenti
Tatjana Mü
Tatjana Mü il 19 Mag 2022
@Jan Thank you so much for your help. Two questions: 1. I thought I have to use strcmp because it should be exactly the same composition. Not less or more elements. If I use ismember, I thought it shows me also lines, which have more elements? 2. The code you wrote is working but just for one mineral. I see Test 2 but not Test 3. How can I add this? Can you maybe explain the last code line?
Jan
Jan il 19 Mag 2022
Modificato: Jan il 19 Mag 2022
"strcmp because it should be exactly the same composition" - ismember() searchs for elements in one string array, which occur in the other string array. They must be exactly the same as with strcmp(). As far as I remember, ismember() does even call strcmp . The meaning of the strings does not matter: Matlab does not understand, that they are comma separated lists. So if you search for "a,b,c" or "axy123" makes no difference. So "not less or more elements" is not meaningful here as long as Matlab does not know anything about "elements".
"The code you wrote is working but just for one mineral." - What does this mean? All minerals of iwant contained in the example part of the DatenBank are found. With the test data, there are 2.
"I see Test 2 but not Test 3." - I see none of them in my example data and you do not provide an example also.
Remember, that my code need not be the full solution of your problem. I'm still trying to find out, what your problem is and what you want as solution. I've just simplified the inputs and removed the not relevant parts of the question to clarify your needs. Again: Please post some relevant input data and the wanted output.
"Can you maybe explain the last code line?" - Please read the documentation of ismember .
iwant(match, 2) = Datenbank(index(match), 2)
Set the elements in the 2nd column of iwant to the corresponding values of Datenbank. If a string in iwant is found in Datenbank, the corrsponding element of the 2nd column of Datenbank is copied.
Now maybe you want all existing elements of the corresponding row. Then the structure of Datenbank is not useful. This would be smarter:
Datenbank = { ...
"Na,Pb" ["Abellaite"]; ...
"Ni" ["Abelsonite"]; ...
"Ce,Na,P,S" ["Abenakiite(-Ce)"]; ...
"As,K,U" ["Abhurite"]; ...
"K,Rb,Si,Sr" ["Test1"]; ...
"Kr,Os,Pt,Rb,Sr" ["Test2", "Test3"]};
Instead of letting Datenbank by a string matrix, it is a cell array, which contains the keyword in the first column and a list of names in the 2nd column.
An important part of programming is the choice of an efficient data representation.
You can convert your Datenbank string array easily:
Datenbank = [ ...
"Na,Pb" "Abellaite" ""; ...
"Ni" "Abelsonite" ""; ...
"Ce,Na,P,S" "Abenakiite(-Ce)" ""; ...
"As,K,U" "Abhurite" ""; ...
"K,Rb,Si,Sr" "Test1" ""; ...
"Kr,Os,Pt,Rb,Sr" "Test2" "Test3"];
[n, m] = size(Datenbank);
DB = cell(n, 2);
for k = 1:n
DB{k, 1} = Datenbank(k, 1);
list = Datenbank(k, 2:m);
DB{k, 2} = list(list ~= "");
end
DB
DB = 6×2 cell array
{["Na,Pb" ]} {["Abellaite" ]} {["Ni" ]} {["Abelsonite" ]} {["Ce,Na,P,S" ]} {["Abenakiite(-Ce)" ]} {["As,K,U" ]} {["Abhurite" ]} {["K,Rb,Si,Sr" ]} {["Test1" ]} {["Kr,Os,Pt,Rb,Sr"]} {["Test2" "Test3"]}

Accedi per commentare.

Più risposte (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by