How are the following methods to compute correlation different?
3 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Hello everyone,
Until recently I was computing the correlation between two matrices in a quite inefficient way. This was my initial approach:
corr_mat = zeros(m,m);
% Consider T as a pre-populated matrix of dimensions (n x 2*m)
for i = 1:m
for j = 1:m
corr_mat = corr(T(:,i), T(:,(j+m));
end
end
However, from my understanding of the description of the 'corr' function, the above line of code is equivalent to:
corr_mat = corr(T(:,1:m), T(:,(m+1):2*m));
I have tried both approaches and compared their results and it turns out they were different. However, upon generating matrices with random numbers and trying the same approach as above, I actually obtained the same results. Here is the test I made:
mat1 = randn(50,10);
mat2 = randn(50,10);
corr1 = zeros(10,10);
for i = 1:10
for j = 1:10
corr1(i, j) = corr(mat1(:,i), mat2(:,j));
end
end
corr2 = corr(mat1, mat2); % Generates the same correlation matrix as corr1
This left me extremely confused, as to me, the above test is equivalent to first 2 scripts. Would someone be able to explain to me how the first two scripts differ from the above test and also (and most importantly) why the first 2 scripts are generating different correlation matrices?
Thank you in advance!
0 Commenti
Risposta accettata
dpb
il 17 Ott 2022
Modificato: dpb
il 17 Ott 2022
But if you create T from your two mat arrays as T=[mat1 mat2]: then the results are all the same; if you got something different it would be owing to the inputs being different and you didn't give any data to illustrate the first contention of getting a different result.
I made the two arrays smaller for convenience, but conclusions still hold...
m1=randn(50,5); m2=randn(50,5);
T=[m1 m2];
m=size(m1,2);
all(corr(m1,m2)==corr(T(:,1:m),T(:,m+1:2*m)),'all')
% now the double loop result
for i=1:m,for j=1:m, c(i,j)=corr(T(:,i),T(:,j+m));end,end
all(c==corr(m1,m2),'all')
% OOOOh...the identical test fails, but ---
max(diff(c-corr(m1,m2)),[],'all')
illustrates it's just rounding error at the double precision magnitude between the different routes to compute the numbers.
Moral -- just use the vectorized corr function...
0 Commenti
Più risposte (0)
Vedere anche
Categorie
Scopri di più su Logical in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!