How are the following methods to compute correlation different?

3 visualizzazioni (ultimi 30 giorni)
Hello everyone,
Until recently I was computing the correlation between two matrices in a quite inefficient way. This was my initial approach:
corr_mat = zeros(m,m);
% Consider T as a pre-populated matrix of dimensions (n x 2*m)
for i = 1:m
for j = 1:m
corr_mat = corr(T(:,i), T(:,(j+m));
end
end
However, from my understanding of the description of the 'corr' function, the above line of code is equivalent to:
corr_mat = corr(T(:,1:m), T(:,(m+1):2*m));
I have tried both approaches and compared their results and it turns out they were different. However, upon generating matrices with random numbers and trying the same approach as above, I actually obtained the same results. Here is the test I made:
mat1 = randn(50,10);
mat2 = randn(50,10);
corr1 = zeros(10,10);
for i = 1:10
for j = 1:10
corr1(i, j) = corr(mat1(:,i), mat2(:,j));
end
end
corr2 = corr(mat1, mat2); % Generates the same correlation matrix as corr1
This left me extremely confused, as to me, the above test is equivalent to first 2 scripts. Would someone be able to explain to me how the first two scripts differ from the above test and also (and most importantly) why the first 2 scripts are generating different correlation matrices?
Thank you in advance!

Risposta accettata

dpb
dpb il 17 Ott 2022
Modificato: dpb il 17 Ott 2022
But if you create T from your two mat arrays as T=[mat1 mat2]: then the results are all the same; if you got something different it would be owing to the inputs being different and you didn't give any data to illustrate the first contention of getting a different result.
I made the two arrays smaller for convenience, but conclusions still hold...
m1=randn(50,5); m2=randn(50,5);
T=[m1 m2];
m=size(m1,2);
all(corr(m1,m2)==corr(T(:,1:m),T(:,m+1:2*m)),'all')
ans = logical
1
% now the double loop result
for i=1:m,for j=1:m, c(i,j)=corr(T(:,i),T(:,j+m));end,end
all(c==corr(m1,m2),'all')
ans = logical
0
% OOOOh...the identical test fails, but ---
max(diff(c-corr(m1,m2)),[],'all')
ans = 1.4225e-16
illustrates it's just rounding error at the double precision magnitude between the different routes to compute the numbers.
Moral -- just use the vectorized corr function...

Più risposte (0)

Prodotti


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by