Asked by Clarisha Nijman
on 1 Nov 2018

Hello, I am trying to find subsets/matrices in matrix A, based on the first 3 columns, and then computing probabilities. For such a small thing the code I made look tremendously long and the results are not good at all! Is there a better way to do this in Matlab? Working with for loops and while loops is very difficult for me.

%given matrix

A=[ 1 2 3 2 3 4;

1 2 3 3 2 4;

1 2 3 2 3 4;

2 3 4 1 2 3;

2 3 4 2 3 4;

1 2 3 3 4 2;

1 4 3 2 3 4;

1 3 4 3 2 4;

1 4 3 1 2 3;

2 3 4 1 2 3];

%Subsets deduced from A(i,1:3)= A(i+1,1:3)= A(i+2,1:3) B should be:

This part of the code works!

1 2 3 2 3 4;

1 2 3 3 2 4;

1 2 3 2 3 4;

1 2 3 3 4 2;

2 3 4 1 2 3;

2 3 4 2 3 4;

2 3 4 1 2 3;

1 4 3 2 3 4;

1 4 3 1 2 3;

1 3 4 3 2 4;

%final result matrix C with the probability of 1 element in the subset should be:

This is my problem! How to find the correct probabilities.

size(B,1)=4

1 2 3 2 3 4 2/4;

1 2 3 3 2 4 ¼;

1 2 3 3 4 2 ¼ ;

size(B,1)=2

2 3 4 1 2 3 ½ ;

2 3 4 2 3 4 ½ ;

size(B,1)=2

1 4 3 2 3 4 ½ ;

1 4 3 1 2 3 ½ ;

size(B,1)=1

1 3 4 3 2 4 1;

The code:

%add column to matrix for indicator variable

indicator=zeros(size( A,1),1);

A=[A indicator];

for i=1:size(A,1)

if A(i,size(A,2))==0 %consider only not adjusted indicators

k=0;

while i+k<=size(A,1)%takes care that index is not exceeded

if A(i,1:3)==A(i+k,1:3)

A(i+k,size(A,2))=i;%indicator variable

end

k=k+1;

end

end

end

%add column to matrix for frequency in the subset

freq=zeros(size( A,1),1);

A=[A freq];

%start subsetting and compute the pdf

j=1;

while j<=max(A(:,size(A,2)-1))

B=A(A(:,size(A,2)-1)==j,:);%save the j-th subset in B

for i=1:size(B,1)

if B(i,size(B,2))==0 %consider only not adjusted indicators

k=0;

while i+k<=size(B,1)%takes care that index is not exceeded

if B(i,1:6)==B(i+k,1:6)

B(i+k,size(B,2))=i;%indicator variable

B

%subsetting to find frequencies

for v=1:max(B(:,size(B,2)))

C=B(B(:,size(B,2))==v,:);%save the j-th subset in B

%computing probability of each element in subset

for w=1:size(C,1)

C(w,size(C,2))= 1/ C(w,size(C,1));

C

end

for w=1:size(C,1)

z=1;

while z+w<size(C,1)

if C(w,1:6)==C(w+z,1:6)

C(w,size(C,2))=C(w,size(C,2))+C(w+z,size(C,2));

C(w+z,size(C,2))=0;

end

z=z+1;

end

%remove lines with probability zero

% Specify conditions, which rows should be

% removed

weg = C(:,size(C,2))==0;

% remove

C(weg,:) = [];

E=[E;C];

end

end

end

k=k+1;

end

end

end

j=j+1;

end

Answer by Guillaume
on 1 Nov 2018

Edited by Guillaume
on 2 Nov 2018

Accepted Answer

If I understood correctly:

A=[ 1 2 3 2 3 4;

1 2 3 3 2 4;

1 2 3 2 3 4;

2 3 4 1 2 3;

2 3 4 2 3 4;

1 2 3 3 4 2;

1 4 3 2 3 4;

1 3 4 3 2 4;

1 4 3 1 2 3;

2 3 4 1 2 3];

[~, ~, uid] = unique(A, 'rows'); %get unique id for each row of A

count = accumarray(uid, 1); %get count of how many times each unique row of A appear

count = count(uid); %and assign to each row

[~, ~, subset] = unique(A(:, 1:3), 'rows'); %identify which subset each row belongs to

subsetcount = accumarray(subset, 1); %count the number of rows in each unique subset

subsetcount = subsetcount(subset); %and assign to each row

probability = count ./ subsetcount; %calculate the probability of each row in its subset

%for pretty display

table(A, subset, probability)

I'm using accumarray to compute histograms, you could replace each instance of accumarray(x, 1) by histcounts(x, 'BinMethod', 'integers')' if it's clearer for you.

Clarisha Nijman
on 2 Nov 2018

It works!!!!

only a little u missing in third line: count = count(uid);

For me, this output works the best: D=[A probability];

Now I am trying to remove duplicate lines/rows. I tried the following codes without succes:

E=unique(sort(D), 'rows');

F=unique(E)

uA = unique(E, 'rows', 'stable');

zB = unique(E,'rows')

Do you have any suggestions for me?

Thank you in advance

Guillaume
on 2 Nov 2018

You'll notice I used meaningful names in my answer. I have no idea what D, E, F are in your code. Code whose variables have meaningful names is instantly easier to understand.

Note that the sort in unique(sort(x)) is pointless. unique does a sort anyway, unless you use the 'stable' option.

If you don't want the repeted rows in each subset, one method:

[rows, urow, uid] = unique(A, 'rows'); %get unique rows, where they come from, and unique id for each

count = accumarray(uid, 1); %histogram of rows, matches the rows variable

[~, ~, subset] = unique(A(:, 1:3), 'rows'); %identify which subset each row belongs to

subsetcount = accumarray(subset, 1); %count the number of rows in each unique subset

subsetcount = subsetcount(subset); %and assign to each row

probability = count ./ subsetcount(urow);

%for pretty display

subset = subset(urow);

table(rows, subset, probability)

Clarisha Nijman
on 3 Nov 2018

Thanks a lot, Guillaume!

Sign in to comment.

Opportunities for recent engineering grads.

Apply Today
## 3 Comments

## Bruno Luong (view profile)

Direct link to this comment:https://it.mathworks.com/matlabcentral/answers/427342-how-to-subset-in-matrix-based-on-the-first-3-columns#comment_631155

## JohnGalt (view profile)

Direct link to this comment:https://it.mathworks.com/matlabcentral/answers/427342-how-to-subset-in-matrix-based-on-the-first-3-columns#comment_631263

## Guillaume (view profile)

Direct link to this comment:https://it.mathworks.com/matlabcentral/answers/427342-how-to-subset-in-matrix-based-on-the-first-3-columns#comment_631275

Sign in to comment.