PCA output: coefficients vs loadings

18 visualizzazioni (ultimi 30 giorni)
Mathew Guilfoyle
Mathew Guilfoyle il 11 Feb 2013
Modificato: Seung Yi Lee il 30 Ago 2021
I would be grateful for some explanation on the output of principal components analysis (pca) from the Statistics Toolbox.
I have a dataset with 150 variables and ~50000 observations.
When I submit this to PCA there is one dominant PC/latent variable that accounts for >95% of the variance.
However, the first column of the output coefficient matrix has very low values for the loading of all the original variables (~0.06). My understanding is that the sum of squared loadings (i.e. the sum of squares of each column of the coefficient matrix) should equal the eigenvalues corresponding to each PC. However the sum(coeff.^2) shows 1 for all columns. This leads me to suspect that the loadings for each column are being scaled?
If I put the same data into SPSS I get the same eigenvalues/% explained but the component loadings on PC1 are now between 0.7 and 0.95.
Could anyone explain why and how these outputs differ?
Thanks

Risposte (5)

Juyeong Choi
Juyeong Choi il 21 Dic 2014
So, how do we calculate the loading for the PC1 as obtained in SPSS? Is there anyone who has an idea?

Xiaosha Wang
Xiaosha Wang il 31 Lug 2015
The output of matlab is coefficient matrix, whereas the output of SPSS is loadings, defined as the correlation between a given principle component and the original variable. The two outputs (coefficient and loadings) are proportional.

the cyclist
the cyclist il 11 Feb 2013
Modificato: the cyclist il 13 Feb 2013
Disclaimer: I am not an expert on PCA. [EDIT: Proof of this is that I was wrong that MATLAB scales. See Ilya's answer, and my comment to my own answer, below.]
I believe that this difference is due to the fact that MATLAB first "centers and scales" the original data into z-scores. I am guessing that differences in the loadings are going to be related to that transformation. (Maybe a scaling factor of the standard deviation of each variable?)
The wikipedia page ( http://en.wikipedia.org/wiki/Principal_component_analysis ) is a good resource. The second paragraph has a brief discussion of the scaling.
  1 Commento
the cyclist
the cyclist il 13 Feb 2013
Matthew, did you ever resolve this? As Ilya pointed out, I was mistaken that MATLAB also scales the data to a z-score. It may be that SPSS does scale. I could not find definitive documentation online about this. I did see that SAS seems to do the scaling automatically. (It's often a good idea to scale, especially if your variables have very different magnitudes.)

Accedi per commentare.


Ilya
Ilya il 11 Feb 2013
The princomp and pca functions center the data but do not scale. (In addition, pca allows not to center.)
The easiest way to understand PCA is using eigenvalue decomposition of the covariance matrix Sigma:
Sigma = V*Lambda*V'
Lambda is the diagonal matrix of eigenvalues. V is an orthonormal matrix of coefficients. Orthonormality implies that the 2-norm of every column is 1.
This is what the MATLAB implementation does. I am not familiar with the SPSS implementation.

Seung Yi Lee
Seung Yi Lee il 30 Ago 2021
Modificato: Seung Yi Lee il 30 Ago 2021
Many years later of the original question posted, I ran into the same problem then figured out.
Coefficient (loading) is scaled by their corresponding egienvalue. Correcting them into the unscaled loading worked for me by using the equation below.
unscaled_loading = coeff.*sqrt(latent)'

Categorie

Scopri di più su Dimensionality Reduction and Feature Extraction in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by