Dimensional error using PCA

2 visualizzazioni (ultimi 30 giorni)
Jaime  de la Mota
Jaime de la Mota il 9 Lug 2019
Commentato: Jaime de la Mota il 10 Lug 2019
Hello everyone. I have generated a code in which I use a Gaussian correlation kernel to generate 1000 realizations of a stochastic process and then, perform PCA over the resulting process. The result is a matrix of 501*1000.
However, when I perform the PCA over this matrix, the results contradict the help at https://la.mathworks.com/help/stats/pca.html
In the info it says that if one inrtoduces a n*p matrix, coeff will be a p*p matrix and score an n*p. Here, I get different results, coeff is a p*n matrix and score a p*p; the weird thing is that the process is reconstructed propperly. Can anyone tell me what is happening?
Thanks.
Additionally, reading theory, coeffs should be standard normal random variables; if I plot the histograms, the resulting variables are normal but not standard. If someone could tell me why these are not standard I would be very thankfull.
The code in question:
close all
clear
clc
[X,Y] = meshgrid(0:0.002:1,0:0.002:1);
Z=exp((-1)*abs(X-Y));
tam=size(X, 1);
number_realizations=1000;
realizacion_mat=zeros(tam, number_realizations);
cov_mat=cov(Z);
[evec_mal, evalM_mal]=eig(cov_mat);
eval_mal=eig(evalM_mal);
num_eval=size(eval_mal,1);
for i=1:num_eval
eval(i)=eval_mal(num_eval-i+1);
evec(:,i)=evec_mal(:,num_eval-i+1);
end
figure
hold on
for j=1:number_realizations
realizacion=zeros(tam, 1);
for i=1:tam
v_a = normrnd(0,1);
realizacion=realizacion+sqrt(eval(i))*evec(:,i)*v_a;
end
realizacion_mat(:,j)=realizacion;
plot(realizacion)
clear('realizacion')
end
[coeff,score,latent,tsquared,explained,mu] = pca(realizacion_mat,'Centered',false);
reconstruction_process=score*coeff';
diference=reconstruction_process-realizacion_mat;
figure
plot(diference)
for i=1:5
figure
histogram(coeff(:,i), 20)
end

Risposta accettata

Jon
Jon il 9 Lug 2019
Modificato: Jon il 9 Lug 2019
The first argument to pca should be n by p, where n is the number of observations. You are supplying it with a p by n matrix. As a result the arguments that are returned are not dimensioned as you expect. I do not see anything in the MATLAB documentation that discusses the distribution (standard normal) of the coefficients. Maybe this is something specific to your application. In any case, if you supply pca with an array, where each row is an observation, then you will be off to a good start.
I also suggest that in your code, you do not use the variable name eval, for eigenvalues. eval is a MATLAB function that evaluates an expression. You did not get any error message as MATLAB assumes you want to use eval as a variable name rather than as a function. It is at the least confusing to read the code if you know what the eval function does, and also if at some point further you actually wanted to use eval as a function you would have problems.
  5 Commenti
Jon
Jon il 9 Lug 2019
Hi I'm not familiar with the theoretical background for your problem, and have not used principle components analysis in this particular context, so I do not have an immediate answer regarding why they are not standard normal variables. I'm sorry, I do not have time to dig deeper, but I would guess that there is a scaling factor somewhere that is not consistent between the two implementations (MATLAB pca, and the reference that you are working from).
Jaime  de la Mota
Jaime de la Mota il 10 Lug 2019
Don't worry.
You have helped me enough.
Thanks again.

Accedi per commentare.

Più risposte (0)

Categorie

Scopri di più su Dimensionality Reduction and Feature Extraction in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by