Why does filtering data before PCA improve results?

I have a set of images that I want to discriminate using PCA. I noticed that applying a low-pass filtering (using filter2) to the images before feeding them into PCA greatly improves the results (it increases the relative amount of variance in the first PCs and corresponds more to what I expect). I then have the following more general question: why does filtering improve the results? I have two conflicting intuitions on this:
  • On the one hand, the performance is better simply because filtering reduces the noise in the images
  • On the other hand, filtering is only a linear transformation of the data, and the principal axes found by PCA should be "dragged" by this linear transformation and give the exact same results.
Would you have any clues to help me clarify this?

7 Commenti

Maybe, but first provide a demonstration of what you mean.
My understanding of spatial filtering is that it is equivalent to a convolution of the image with a kernel profile, therefore each pixel is replaced by a weighted average of the neighouring pixels (the weights being given by the kernel profile). This is a linear operation, isn't it? Or did I miss something?
The spatial filtering is linear, but I don't know why you think PCA is invariant to linear transformations of the observations. The following simplified example shows that it is not.
X=rand(7,5); X=X-mean(X);
[U,S,V]=svd(X,0); PCA1=U*S
PCA1 = 7×5
-0.5159 0.3722 -0.3550 0.0126 -0.0581 0.6617 0.6046 0.1841 0.0151 -0.0022 0.2549 -0.3709 -0.0693 0.2474 0.0109 0.2677 -0.2852 -0.2614 0.1047 -0.0047 -0.4143 -0.1693 0.5058 0.0784 -0.0438 0.1698 -0.2888 -0.0159 -0.4318 -0.0113 -0.4240 0.1374 0.0118 -0.0264 0.1091
[U,S,V]=svd(X*rand(5),0); PCA2=U*S
PCA2 = 7×5
-0.5649 0.4691 -0.1119 0.0754 -0.0108 1.1273 0.1412 -0.2793 -0.0462 -0.0124 0.2503 -0.4315 -0.0264 0.0420 0.0115 -0.2608 -0.3301 -0.1545 0.0349 0.0043 0.5185 0.0112 0.4618 0.0151 -0.0156 -1.0065 -0.1389 0.0153 -0.0824 -0.0166 -0.0639 0.2791 0.0950 -0.0387 0.0396
I was forced to believe this from my observation already, I am asking why my intuition is not correct. To elaborate on it: Imagine a simplified case where the images are made of only 2 pixels, so that each image can be represented as a point in a 2D space. A set of N images is then a cluster of N points in a plane. For the ease of imagination let's say that there is a strong correlation between the two pixels such that the cluster is concentrated around a particular line (call it L). PCA would then find PC1 close to L. Applying a linear transformation of the data applies a scaling/rotation/shearing of the cluster in the plane, and therefore also of L. Let's say that L gets transformed to a new line L'. PCA applied to the transformed cluster should find PC1 close to L', and therefore the projections of the images on L' should be the same as they were on L before transformation (withing a scaling factor). I'm happy to hear where I'm wrong in this.
(In this context I mean by linear that the effect of the filtering can be reprensented by a matrix multiplication of the data.)
Convolution f*g is linear wrt f and wrt g.
Let me try too understand your question, because I do this extremey simple code to feel how filtering improve PCA, and my conclusion is quite the opposite:
M=diag([1,100]);
x=randn(2,1e6);
y=M*x;
% PCA of Non filtered data
[U,S,V]=svd(y',0);
PCA=V(:,1);
if PCA(2)<0
PCA=-PCA;
end
nfiltererror = norm(PCA-[0;1])
nfiltererror = 1.8027e-05
% PCA of filtered data
xf = mean(x,2);
yf = M*xf;
[Uf,Sf,Vf]=svd(yf',0);
PCAf=Vf(:,1);
if PCAf(2)<0
PCAf=-PCAf;
end
filtererror = norm(PCAf-[0;1])
filtererror = 0.0279
if filtererror < nfiltererror
fprintf('filter is better\n');
else
fprintf('non-filter is better\n');
end
non-filter is better
So what do you observe? Can you make a MWE (example with 2 pixels?) to show it?

Accedi per commentare.

Risposte (1)

PCA applied to the transformed cluster should find PC1 close to L', and therefore the projections of the images on L' should be the same as they were on L (withing a scaling factor)
That is true for a rotation, but for arbitrary linear transformations, it is not true when the dimension of L is greater than 1. We can recraft my example above to examine how the singular values change under an arbitrary transformation when L and L' are 2D:
X=rand(7,2); X=[X,X]; X=X-mean(X);
S1=svd(X,0)
S1 = 4×1
1.4299 1.0318 0.0000 0.0000
S2=svd(X*rand(4),0)
S2 = 4×1
2.0355 0.2737 0.0000 0.0000
Clearly also the change is more than just a global scaling,
S1./S2.*[1 1 0 0]'
ans = 4×1
0.7025 3.7701 0 0

Prodotti

Release

R2021b

Richiesto:

il 2 Ago 2022

Modificato:

il 2 Ago 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by