Azzera filtri
Azzera filtri

Remove duplicate variables depending on a variable in a second column

2 visualizzazioni (ultimi 30 giorni)
Dear experts, I have a list of variables where I need to remove duplicate variables based on the variable in column 2. Variables with a '1' in column 2 are of better quality than variables with a '0'.
1) In case of duplicate variables, I want to keep the variables that have value 1 in the second column. In cases when there are multiple duplicates with a 1 then it needs to keep randomly only one variable. See example below: Here I want to keep the variable BG1028 where the data in the third column is 1.3. For BG1030, I want to keep the variable with 3.0 or 0.3 in the third column.
2) In case of duplicate variables which all have a zero in the second column then it needs to keep randomly only one variable. See example below: I need to keep one variable of BG1027 (random choice).
I hope it is clear. Im puzzling how to do this. This is the code I came up with so far with help from Kirby Fear.
ppn = [ {'BG1026';'BG1027';'BG1027';'BG1028';'BG1028';'BG1028';'BG1029';'BG1029';...
'BG1030';'BG1030';'BG1030';'BG1030'},... % start col 2
{'0';'0';'0';'1';'0';'0';'1';'0';'0';'1';'0';'1'},... % start col 3
{'1.2';'2.2';'5.2';'4.2';'0.2';'8.9';'3.4';'3.0';'0.3';'1.3';'0.3';'1.7'} ];
% Storing ppn column 2 as numerical values
bPpn=cell2mat(cellfun(@(c)str2double(c),ppn(:,2),...
'UniformOutput',false));
% Get names of duplicates
chooseNames = ppn([strcmp(ppn(1:end-1,1),ppn(2:end,1));false],1);
% Loop over chooseNames and keep one at random.
if numel(chooseNames)>0,
for j=1:numel(chooseNames),
dupidx=find(strcmp(chooseNames{j},ppn(:,1)));
dupidx(randi(numel(dupidx)))=[];
ppn(dupidx,:)=[];
end
end

Risposta accettata

WAT
WAT il 24 Set 2015
Give something like this a try:
ppn = [ {'BG1026';'BG1027';'BG1027';'BG1028';'BG1028';'BG1028';'BG1029';'BG1029';...
'BG1030';'BG1030';'BG1030';'BG1030'},... % start col 2
{'0';'0';'0';'1';'0';'0';'1';'0';'0';'1';'0';'1'},... % start col 3
{'1.2';'2.2';'5.2';'4.2';'0.2';'8.9';'3.4';'3.0';'0.3';'1.3';'0.3';'1.7'} ];
[uniqNames, ia, ic] = unique(ppn(:,1));
ia = [ia; 1+length(ic)];
ppn_out = {}; % initialize output
for i = 1:length(uniqNames);
sub = ppn(ia(i):ia(i+1)-1,:); % find only entries with uniqNames(i)
sub = sub(find(cell2mat(sub(:,2)) == max(cell2mat(sub(:,2)))),:); % find only those entries with the maximal value in col 2
ppn_out = [ppn_out; sub(randi(size(sub,1)),:)]; % select one entry at random, put it in ppn_out
end
  3 Commenti
WAT
WAT il 24 Set 2015
Modificato: WAT il 24 Set 2015
That's odd, it's also skipping BG1026 for you. It seems to be behaving fine for me, I wonder if there's something goofy in the unique() command? (I'm on R2013a or R2015a and it works fine on both)
Try getting rid of all the semicolons, it's short enough that it should be easy to follow what the code is doing.
Marty Dutch
Marty Dutch il 25 Set 2015
Yes, youre right. I tried running it on 2013b and in 2015a and in both it seems to work fine now... Thanks for the response. It does not worked in the 2012b version.

Accedi per commentare.

Più risposte (0)

Categorie

Scopri di più su Elementary Math in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by