# Principle Component Analysis/ Singular value decomposition; great with ovariancancer dataset terrible with my data

Christopher McCausland on 16 Dec 2021
Hello,
I am currently getting to grips with PCA, I came accorss a great tutorial from Steve Brunton on its use with matlab. This turotial makes use of the ovariancancer dataset included with matlab and works very well for seperating data, however when I try to apply it to my own data the seperation is nowhere near as clear. So my questions;
1. Is there a discriptor of what each of the 4000 features within the ovariancancer database are and any pre-processing done on them?
2. For anyone maths minded, what would be causing this to work well for one dataset and not the other. I can see my rank is very high but I cannot understand why.
What is my data? My data is 13 channel PSG recordings, from which I window into 10 second windows, with 5 second overlaps. I then calculate Mean, Med,Mode,variance ,Standard deviation, Interquartile range, range, kurtios and skewness. This gives me 117 features (9*13). I will include the first 1000 rows of features and clinical truth as the data is open source anyways. The code, which works well is included below;
%load ovariancancer % works great with this featureset but poorly with others, I have renamed
% my uploaded vaiables to match this example
[U,S,V] = svd(obs,'econ');
figure
subplot(1,2,1)
semilogy(diag(S),'k-o','LineWidth',2.5)
set(gca,'FontSize',15), axis tight, grid on
subplot(1,2,2)
plot(cumsum(diag(S))./sum(diag(S)),'k-o','Linewidth',2.5)
set(gca,'FontSize',15), axis tight, grid on
set(gcf,'Position',[1440 100 3*600 3*250])
figure, hold on
for i = 1:size(obs,1)
x = V(:,1)'*obs(i,:)';
y = V(:,2)'*obs(i,:)';
z = V(:,3)'*obs(i,:)';
if (grp(i) == 1)
plot3(x,y,z,'rx','LineWidth',1);
else
plot3(x,y,z,'bo','LineWidth',1);
end
end
xlabel('PC1'), ylabel('PC2'), zlabel('PC3')
view(85,25), grid on, set(gca,'FontSize',15)
set(gcf,'Position',[1400 100 1200 900])
Christopher McCausland on 17 Dec 2021
For anyone that comes accross this I belive my problem was high variance between the featrues. I resolved this by using the normalize function however this leads to U,S,V returning arrays of NaN values. I will continue to update this if I make any progress but I am not sure why normalisation causes this NaN output.

