MATLAB Answers

## How to subset in matrix based on the first 3 columns?

Asked by Clarisha Nijman

### Clarisha Nijman (view profile)

on 1 Nov 2018
Latest activity Commented on by Clarisha Nijman

### Clarisha Nijman (view profile)

on 3 Nov 2018
Accepted Answer by Guillaume

### Guillaume (view profile)

Hello, I am trying to find subsets/matrices in matrix A, based on the first 3 columns, and then computing probabilities. For such a small thing the code I made look tremendously long and the results are not good at all! Is there a better way to do this in Matlab? Working with for loops and while loops is very difficult for me.
%given matrix
A=[ 1 2 3 2 3 4;
1 2 3 3 2 4;
1 2 3 2 3 4;
2 3 4 1 2 3;
2 3 4 2 3 4;
1 2 3 3 4 2;
1 4 3 2 3 4;
1 3 4 3 2 4;
1 4 3 1 2 3;
2 3 4 1 2 3];
%Subsets deduced from A(i,1:3)= A(i+1,1:3)= A(i+2,1:3) B should be:
This part of the code works!
1 2 3 2 3 4;
1 2 3 3 2 4;
1 2 3 2 3 4;
1 2 3 3 4 2;
2 3 4 1 2 3;
2 3 4 2 3 4;
2 3 4 1 2 3;
1 4 3 2 3 4;
1 4 3 1 2 3;
1 3 4 3 2 4;
%final result matrix C with the probability of 1 element in the subset should be:
This is my problem! How to find the correct probabilities.
size(B,1)=4
1 2 3 2 3 4 2/4;
1 2 3 3 2 4 ¼;
1 2 3 3 4 2 ¼ ;
size(B,1)=2
2 3 4 1 2 3 ½ ;
2 3 4 2 3 4 ½ ;
size(B,1)=2
1 4 3 2 3 4 ½ ;
1 4 3 1 2 3 ½ ;
size(B,1)=1
1 3 4 3 2 4 1;
The code:
%add column to matrix for indicator variable
indicator=zeros(size( A,1),1);
A=[A indicator];
for i=1:size(A,1)
if A(i,size(A,2))==0 %consider only not adjusted indicators
k=0;
while i+k<=size(A,1)%takes care that index is not exceeded
if A(i,1:3)==A(i+k,1:3)
A(i+k,size(A,2))=i;%indicator variable
end
k=k+1;
end
end
end
%add column to matrix for frequency in the subset
freq=zeros(size( A,1),1);
A=[A freq];
%start subsetting and compute the pdf
j=1;
while j<=max(A(:,size(A,2)-1))
B=A(A(:,size(A,2)-1)==j,:);%save the j-th subset in B
for i=1:size(B,1)
if B(i,size(B,2))==0 %consider only not adjusted indicators
k=0;
while i+k<=size(B,1)%takes care that index is not exceeded
if B(i,1:6)==B(i+k,1:6)
B(i+k,size(B,2))=i;%indicator variable
B
%subsetting to find frequencies
for v=1:max(B(:,size(B,2)))
C=B(B(:,size(B,2))==v,:);%save the j-th subset in B
%computing probability of each element in subset
for w=1:size(C,1)
C(w,size(C,2))= 1/ C(w,size(C,1));
C
end
for w=1:size(C,1)
z=1;
while z+w<size(C,1)
if C(w,1:6)==C(w+z,1:6)
C(w,size(C,2))=C(w,size(C,2))+C(w+z,size(C,2));
C(w+z,size(C,2))=0;
end
z=z+1;
end
%remove lines with probability zero
% Specify conditions, which rows should be
% removed
weg = C(:,size(C,2))==0;
% remove
C(weg,:) = [];
E=[E;C];
end
end
end
k=k+1;
end
end
end
j=j+1;
end

Bruno Luong

### Bruno Luong (view profile)

on 1 Nov 2018
'I am trying to find subsets/matrices in matrix A, based on the first 3 columns, and then computing probabilities'
This description lacks clarity, and I certainly not go over your code to figure out what you want to achieve.
JohnGalt

### JohnGalt (view profile)

on 1 Nov 2018
agreed with Bruno... "Hello, I am trying to find subsets/matrices in matrix A, based on the first 3 columns, and then computing probabilities" - find sub-matrices of what form? - computing probabilities of what?
Guillaume

### Guillaume (view profile)

on 1 Nov 2018
My understanding is that all rows with identical columns 1 to 3 belong to a subset. The probability of a row is the number of times it appear in the matrix divided by the number of rows in the subset it belongs to.
I too have not tried to understand the code.

Sign in to comment.

## 1 Answer

Answer by Guillaume

### Guillaume (view profile)

on 1 Nov 2018
Edited by Guillaume

### Guillaume (view profile)

on 2 Nov 2018
Accepted Answer

If I understood correctly:
A=[ 1 2 3 2 3 4;
1 2 3 3 2 4;
1 2 3 2 3 4;
2 3 4 1 2 3;
2 3 4 2 3 4;
1 2 3 3 4 2;
1 4 3 2 3 4;
1 3 4 3 2 4;
1 4 3 1 2 3;
2 3 4 1 2 3];
[~, ~, uid] = unique(A, 'rows'); %get unique id for each row of A
count = accumarray(uid, 1); %get count of how many times each unique row of A appear
count = count(uid); %and assign to each row
[~, ~, subset] = unique(A(:, 1:3), 'rows'); %identify which subset each row belongs to
subsetcount = accumarray(subset, 1); %count the number of rows in each unique subset
subsetcount = subsetcount(subset); %and assign to each row
probability = count ./ subsetcount; %calculate the probability of each row in its subset
%for pretty display
table(A, subset, probability)
I'm using accumarray to compute histograms, you could replace each instance of accumarray(x, 1) by histcounts(x, 'BinMethod', 'integers')' if it's clearer for you.

#### 4 Comments

Show 1 older comment
Clarisha Nijman

### Clarisha Nijman (view profile)

on 2 Nov 2018
It works!!!!
only a little u missing in third line: count = count(uid);
For me, this output works the best: D=[A probability];
Now I am trying to remove duplicate lines/rows. I tried the following codes without succes:
E=unique(sort(D), 'rows');
F=unique(E)
uA = unique(E, 'rows', 'stable');
zB = unique(E,'rows')
Do you have any suggestions for me?
Thank you in advance
Guillaume

### Guillaume (view profile)

on 2 Nov 2018
You'll notice I used meaningful names in my answer. I have no idea what D, E, F are in your code. Code whose variables have meaningful names is instantly easier to understand.
Note that the sort in unique(sort(x)) is pointless. unique does a sort anyway, unless you use the 'stable' option.
If you don't want the repeted rows in each subset, one method:
[rows, urow, uid] = unique(A, 'rows'); %get unique rows, where they come from, and unique id for each
count = accumarray(uid, 1); %histogram of rows, matches the rows variable
[~, ~, subset] = unique(A(:, 1:3), 'rows'); %identify which subset each row belongs to
subsetcount = accumarray(subset, 1); %count the number of rows in each unique subset
subsetcount = subsetcount(subset); %and assign to each row
probability = count ./ subsetcount(urow);
%for pretty display
subset = subset(urow);
table(rows, subset, probability)
Clarisha Nijman

### Clarisha Nijman (view profile)

on 3 Nov 2018
Thanks a lot, Guillaume!

Sign in to comment.