PCA output gives NaN after normalizing input matrix
12 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Christopher McCausland
il 17 Dic 2021
Commentato: Christopher McCausland
il 17 Dic 2021
Hello,
I have an array of 117 features and 125941 observations. I preformed principle component analysis via svd(x) (code below) and plotted the first three compoents in a scatter graph w.r.t clinical truth (true, false clinical data to see seperation). Seperation preformance was poor as I had forgotten to first normilise the data which is required for PCA.
After using normalize(obs) I recomputed the code again, I found that after normilisation;
[U,S,V] = svd(obs,'econ');
U and V return 125941x117 and 117x177 arrays of 'NaN' while S returns the (idetity matrix .* NaN). I do not understand how normilisng the data can change the output from a valid numerical output to NaN given the the 'new' input data is just the old data scaled.
I will include a copy of the data which works, if you use the normilise(x) command this data will return NaN values instead in the code below;
I do not understand why normalizing the data (obs) would cause this NaN output?
[U,S,V] = svd(obs,'econ'); % Preform svd
figure
subplot(1,2,1)
semilogy(diag(S),'k-o','LineWidth',2.5) %log graph matrix rank, quick drop off is better
set(gca,'FontSize',15), axis tight, grid on
subplot(1,2,2)
plot(cumsum(diag(S))./sum(diag(S)),'k-o','Linewidth',2.5) % Log graph of each compoent makeup
set(gca,'FontSize',15), axis tight, grid on
set(gcf,'Position',[1440 100 3*600 3*250])
figure, hold on
for i = 1:size(obs,1)
x = V(:,1)'*obs(i,:)'; % Generate the first three compoents and generate new vectors based on these
y = V(:,2)'*obs(i,:)';
z = V(:,3)'*obs(i,:)';
if (grp(i) == 1)
plot3(x,y,z,'rx','LineWidth',1); % If clinically positive plot as a red x
else
plot3(x,y,z,'bo','LineWidth',1);% If clinically negitive plot as a blue o
end
end
xlabel('PC1'), ylabel('PC2'), zlabel('PC3')
view(85,25), grid on, set(gca,'FontSize',15)
set(gcf,'Position',[1400 100 1200 900])
0 Commenti
Risposta accettata
David Goodmanson
il 17 Dic 2021
Hi Christopher,
I am not making any comment on the svd procedure, but
f = find(all(obs==0))
f =
79 80 81 82 83 84 85 86 87 88 89 90 91
says that the indicated columns consist of all zeros. For those columns, 'normalize' is trying to scale a standard deviation of 0 to a standard deviation of 1, so it gives up and puts NaNs.
Più risposte (0)
Vedere anche
Categorie
Scopri di più su Dimensionality Reduction and Feature Extraction in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!