Why does corr function result depend on the number of columns ?

7 visualizzazioni (ultimi 30 giorni)
Hello everyone,
I want to calculate the correlation coefficient between several physical parameters for astrophysical purpose (with Matlab R2019b). Since I have a number of different parameters, instead of using corr(A,B) for each pair of parameters (A,B), I created a matrix whose columns correspond to all the parameters of interest and calculated corr(X,X) to get the correlation matrix and thus look at the non diagonal terms to get the pairwise correlation coefficients. However, I am surprised that the correlation coefficient for the same pair of parameters (A,B) varies depending on the other parameters (columns) I include in the matrix. As far as I understand corr(X,Y) calculates the pairwise correlation coefficient, which should depend only on the pair of columns considered right ?
Thank you in advance if someone can explain if I misunderstood the use of the corr function !
Philippe

Risposta accettata

Adam Danz
Adam Danz il 22 Ago 2020
Modificato: Adam Danz il 27 Ago 2020
Understanding the output of rho=corr(x)
r = corr(x,x);
is the same as
r = corr(x);
If a single value is changed in x in column n, it should affect all of the correlation matrix results in row n and column n.
For example,
% x is an nx3 matrix
% r = corr(x)
% r shows the correlation between
% the following column pairs:
r =
1 & 1 1 & 1 1 & 3
2 & 1 2 & 2 2 & 3
3 & 1 3 & 2 3 & 3
If a value changes in column 3, you can see above that it would affect all values in column 3 and row 3 of the correlation matrix.
Here's a demo
x0 = [1 6 5; 9 3 5; 7 5 3; 5 9 5];
x1 = x0;
x1(10) = 9;
NaN infestation
As explained in this answer, a single NaN value in the input matrix of r=corr(x) at x(i,j) will result in all NaN values in row i and column j of the output matrix.
A single NaN value in one of the two matrices x or y of r=corr(x,y) at coordinate (i,j) will result in a column of NaN values in column j of the output matrix but row i will otherwise be OK.
Ignoring missing values (e.g. NaN).
As explained in this answer, to compute column-wise correlation while ignoring missing values, set the 'Rows' property to either 'complete' or 'pairwise'.
  8 Commenti
Ilaria Sani
Ilaria Sani il 21 Apr 2022
Thanks for the above explanations.
I have a follow up question.
What if I add an entire new column?
What I see is that the correlation of untouched columns changes. Is that possible? Is there a correction for number of comarisons?
Thank you so much for your kind reply.
Ilaria
Adam Danz
Adam Danz il 21 Apr 2022
>... the correlation of untouched columns changes...
That shouldn't be the case. Consider this demo below where x1 is the same as x0 except for the addition of a 4th column. The corr results only differ in row 4 and column 4.
x0 = [1 6 5; 9 3 5; 7 5 3; 5 9 5]
x0 = 4×3
1 6 5 9 3 5 7 5 3 5 9 5
corr(x0)
ans = 3×3
1.0000 -0.5270 -0.2928 -0.5270 1.0000 0.2000 -0.2928 0.2000 1.0000
x1 = [x0,[2;3;4;1]]
x1 = 4×4
1 6 5 2 9 3 5 3 7 5 3 4 5 9 5 1
corr(x1)
ans = 4×4
1.0000 -0.5270 -0.2928 0.5292 -0.5270 1.0000 0.2000 -0.7746 -0.2928 0.2000 1.0000 -0.7746 0.5292 -0.7746 -0.7746 1.0000

Accedi per commentare.

Più risposte (0)

Categorie

Scopri di più su MATLAB in Help Center e File Exchange

Prodotti


Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by