intersection of multiple arrays

I have multiple 1D arrays with integers. Eg. A=[1,2,3,4] B=[1,4,5,6,7] C=[1,5,8,9,10,11]. Here 1 is repeated in all the arrays and 4 in A&B, 5 in B&C. I want these repeated values to be assigned to a single array, based on the size of the array, and removed from other arrays. How can I achieve this. Please help. TIA.

5 Commenti

KSSV
KSSV il 14 Set 2017
YOu want to remove 1, 5 and 4 from all the sets?
KL
KL il 14 Set 2017
What do you mean by "based on the size of the array"?
@KSSV. I want all the arrays to contain unique element. Meaning no value should be repeated in 2 or more arrays.
@KL. "based on the size of the array" means whichever array having smaller number of elements, the repeated value should be assigned in that array and removed from other arrays.
KL
KL il 14 Set 2017
So you only wanto to remove 1 from all the matrices in the above example?
Ananya Malik
Ananya Malik il 14 Set 2017
Modificato: Ananya Malik il 14 Set 2017
No. The common elements are [1,4,5]. Upon removing them from all arrays, we get A=[2,3] B=[6,7] and C=[8,9,10,11]. 1 can be present in either A or B. Suppose we put 1 in A ==> A=[1,2,3] B=[6,7] C=[8,9,10,11]. 4 can be present in A or B. Size(B)<size(A), therefore A=[1 2 3] B=[4,6 7] c=[8,9,10,11]. Finally 5 can be present in B or C, but size(B)<size(C). Therefore the final output should be A=[1,2,3] B=[4,5,6,7] and C=[8,9,10,11].

Accedi per commentare.

 Risposta accettata

Guillaume
Guillaume il 14 Set 2017
This is in no way guaranteed to give you a perfectly balanced output. After that you need to look at optimisation algorithm which is beyond my field. This is also a lot less efficient than my other solution:
C = {[1 2 3 4 5 6], [1 4 5 6 7], [1 5 8 9 12 20]}
[uvals, ~, bin] = unique(cell2mat(C));
origarray = repelem(1:numel(C), cellfun(@numel, C));
dist = table(uvals', accumarray(bin, 1), accumarray(bin, origarray, [], @(x) {x}), 'VariableNames', {'value', 'repetition', 'arrayindices'});
dist = sortrows(dist, 'repetition');
The above builds a table of all the unique values, how many times they're repeated and where they come from originally. You can then iterate through that to distribute the values:
%distribute non-repeated values first
newC = accumarray(cell2mat(dist.arrayindices(dist.repetition == 1)), ...
dist.value(dist.repetition == 1), [], @(x){x'});
%and remove them from table
dist(dist.repetition == 1, :) = [];
%distribute remaining values one at a time
for row = 1:height(dist)
destarrays = dist.arrayindices{row}; %which array originally had the current value?
[~, destidx] = min(cellfun(@numel, newC(destarrays))); %find which is currently smallest
newC{destarrays(destidx)} = [newC{destarrays(destidx)}, dist.value(row)]; %append value to that smallest array
end
celldisp(newC)
I'm using a table here, which is not the most efficient container in matlab, but that make it easier to understand the code.

1 Commento

Thank you so much. Not exactly what I wanted, but pretty close. Will tweak it further for my use. Cheers

Accedi per commentare.

Più risposte (2)

KL
KL il 14 Set 2017
A=[1,2,3,4];
B=[1,4,5,6,7];
C=[1,5,8,9,10,11];
M = {A,B,C};
N = {[B C], [A C], [A B]};
CommonElements = unique(cell2mat(cellfun(@intersect, M,N,'UniformOutput',false)));
NewM = cellfun(@(x) removeEl(x,CommonElements),M,'UniformOutput',false);
%and then
function x = removeEl(x,a)
for el=1:length(a)
x(x==a(el))=[];
end
end
Guillaume
Guillaume il 14 Set 2017
This is how I'd do it:
C = {[1 2 3 4], [1 4 5 6 7], [1 5 8 9 10 11]};
%sort cell array by increasing vector size:
[~, order] = sort(cellfun(@numel, C));
C = C(order);
%get all unique values
uvals = unique(cell2mat(C));
%go through all arrays, keeping all values in uvals, then removing them from uvals so they can't be used by the next array
for iarr = 1:numel(C)
C{iarr} = intersect(C{iarr}, uvals); %only keep the values in uvals
uvals = setdiff(uvals, C{iarr}); %and remove the one we've just used
end
celldisp(C)

1 Commento

Ananya Malik
Ananya Malik il 14 Set 2017
Modificato: Ananya Malik il 14 Set 2017
I am trying to balance the number of elements in the arrays. So if I change my input to A=[1 2 3 4 5 6], B=[1 4 5 6 7] and C=[1 5 8 9 10 11]. Using the code you provided gives me A=[1 4 5 6 7] B=[2 3] and C=[8 9 10 11]. Whereas ideal case would be A=[2 3 4 6] B=[1 7 5] and C=[8 9 10 11] or A=[1 2 3 4] B=[5 6 7] C=[8 9 10 11] or something along this line.

Accedi per commentare.

Prodotti

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by