Azzera filtri
Azzera filtri

What exactly do coeff and score represent in PCA and how can I reconstruct my data from them?

6 visualizzazioni (ultimi 30 giorni)
Hello all, having a bit of trouble figuring this out. Basically, I would like to run pca on one set of data, and then use the same transformation matrix on another set of data. My original thought was that the 'coeff' output of the pca command would give me this matrix; in other words, I would have expected that:
mydata*coeff == scores
Thus, if I had a second set of data, I could run PCA and obtain the following outputs:
[coeff_2 scores_2 latent_2] = pca(mydata_2)
or, I could obtain a similar output by using the coeff matrix from 'mydata' to calculate scores_2:
scores_2a = my_data_2*coeff
The matrix dimensions check out on this as well - if my_data is NxP, then coeff is PxP, scores is NxP, so my_data*coeff would be NxP. although these dimensions do check out, just doing some sample calculations in MATLAB has shown me that this is not the case, and I'm not sure why.
It has been a while since I've taken any matrix algebra courses, so I may be misremembering or misunderstanding how these matrices are calculated. If anyone can help me out, it would be much appreciated!

Risposte (1)

Swarooph
Swarooph il 15 Lug 2016
Good question! Made me dig a little bit so that was fun.
I think this is because PCA normalizes the input raw data so that the normalized data has zero mean (does not scale it for standard deviation).
So in your example if you try the following, it should hold.
mydata = 10 + randn(20,5); %Random data 20 observations, 5 variables
[coeff,scores_a] = pca(mydata); %Do PCA
mydata_mean = mean(mydata); %Find mean of data (columns)
mydata_mean = repmat(mydata_mean,20,1); %Replicate mean vector to matrix for subtraction
my_data_norm = mydata - mydata_mean; % Normalize data to zero mean y subtraction
scores_b = my_data_norm*coeff; %Manually calculate scores using PCA coeff and normalized data
err = max(max((abs(scores_a - scores_b)))) %Calculate error as the max of absolute difference in 2 methods
For my random data runs, err was of the order of 1e-15.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by