PCA output: coefficients vs loadings
72 views (last 30 days)
I would be grateful for some explanation on the output of principal components analysis (pca) from the Statistics Toolbox.
I have a dataset with 150 variables and ~50000 observations.
When I submit this to PCA there is one dominant PC/latent variable that accounts for >95% of the variance.
However, the first column of the output coefficient matrix has very low values for the loading of all the original variables (~0.06). My understanding is that the sum of squared loadings (i.e. the sum of squares of each column of the coefficient matrix) should equal the eigenvalues corresponding to each PC. However the sum(coeff.^2) shows 1 for all columns. This leads me to suspect that the loadings for each column are being scaled?
If I put the same data into SPSS I get the same eigenvalues/% explained but the component loadings on PC1 are now between 0.7 and 0.95.
Could anyone explain why and how these outputs differ?
Juyeong Choi on 21 Dec 2014
So, how do we calculate the loading for the PC1 as obtained in SPSS? Is there anyone who has an idea?
Xiaosha Wang on 31 Jul 2015
The output of matlab is coefficient matrix, whereas the output of SPSS is loadings, defined as the correlation between a given principle component and the original variable. The two outputs (coefficient and loadings) are proportional.
the cyclist on 11 Feb 2013
Edited: the cyclist on 13 Feb 2013
Disclaimer: I am not an expert on PCA. [EDIT: Proof of this is that I was wrong that MATLAB scales. See Ilya's answer, and my comment to my own answer, below.]
I believe that this difference is due to the fact that MATLAB first "centers and scales" the original data into z-scores. I am guessing that differences in the loadings are going to be related to that transformation. (Maybe a scaling factor of the standard deviation of each variable?)
The wikipedia page ( http://en.wikipedia.org/wiki/Principal_component_analysis ) is a good resource. The second paragraph has a brief discussion of the scaling.
Ilya on 11 Feb 2013
The princomp and pca functions center the data but do not scale. (In addition, pca allows not to center.)
The easiest way to understand PCA is using eigenvalue decomposition of the covariance matrix Sigma:
Sigma = V*Lambda*V'
Lambda is the diagonal matrix of eigenvalues. V is an orthonormal matrix of coefficients. Orthonormality implies that the 2-norm of every column is 1.
This is what the MATLAB implementation does. I am not familiar with the SPSS implementation.
Seung Yi Lee on 30 Aug 2021
Edited: Seung Yi Lee on 30 Aug 2021
Many years later of the original question posted, I ran into the same problem then figured out.
Coefficient (loading) is scaled by their corresponding egienvalue. Correcting them into the unscaled loading worked for me by using the equation below.
unscaled_loading = coeff.*sqrt(latent)'