Dominating Frequencies of FFT and Lomb Periodogram do not agree - Why?
3 views (last 30 days)
I calculate the Fourier transform and the Lomb Periodogram of data which contain NaNs. The signal is sampled regularly at a rate of 96 per day, see attached .mat file. Missing data ('NaN') are set to zero before calculating the FFT. (Alternatively, I have removed the NaNs by interpolation of adjacent non-NaNs, which resulted in essentially the same Fourier frequencies).
Based on the Fourier transform, 1/day is the dominant frequency. The Lomb periodogram indcates a dominant frequency of 2/day. What is the cause of this disagreement?
Fs = 96; % Sampling frequency (per day)
T = 1/Fs; % Sampling period
L = 96*212; % Total number of time steps
t = (0:L-1)*T; % Time vector (in units of days)
[Plomb,flomb] = plomb(yy,t*60*60*24,4e-4,'power');
plot(flomb*60*60*24,Plomb); grid on
title('Lomb Periodogram of yy(t)')
xlabel('Frequency (per day)')
yy(isnan(yy))=0; % Setting NaNs to zero
Y = fft(yy); % FFT
P2 = abs(Y/L);
P1 = P2(1:L/2+1);
P1(2:end-1) = 2*P1(2:end-1); %single-sided spectrum
f = Fs*(0:(L/2))/L;
plot(f,P1); grid on
title('Single-Sided Amplitude Spectrum of yy(t)')
xlabel('f (per day)')
Martijn on 17 Dec 2015
It does not look like there is anything with how you visualize the results. The results really are different because you use two completely different approaches with different assumptions.
With plomb you assume your data can be modeled as a sum of sinusoids of different frequencies, amplitudes and phases. And it determines the amplitudes and phases using a least-squares method. I.e. you compute the error between the measured data and the estimated data and then minimize this error. While computing the error, and this is is really the important part here, you completely ignore the datapoints for which the measured data was NaN, the algorithm does not assume the values were 0 or interpolate them, they are completely ignored. When you manually try some least-squares fits with sine waves with frequency 1/day and 2/day:
At first the 1/day fit may look better because you do not have these peaks where there is no original data, but when you actually realize that you ignored the NaNs:
You should be able to see that the 2/day sine wave actually gives a better fit; so it is not surprising to see that the most of the power goes into that frequency.
With your FFT approach you do make assumptions about the NaNs, you replace them with zeros. In general this would not be such a good idea as it would add all kinds of additional frequencies to your data. In your very particular case however it looks like the NaNs occur periodically so then replacing them with zeros is very close to multiplying your data with a square wave. Which in the frequency domain would give a convolution with the odd harmonics of the square wave which I guess would not give a too bad effect on the shape of your spectrum; so that approach might work here.
Still whether you should ignore the NaNs, assume them to be zero or interpolate in some way more depends on what this data really represents and what you are trying to accomplish. Is this a day/night cycle with only measurements during the day? Are you measuring a natural phenomenon where in the night it is safe to assume that the data would have been (close to) zero? Then assuming 0 would totally make sense, but if not perhaps you really need a different model to represent your data, fit that model to the data and then draw conclusions from the fitted parameters of this model.