Finding peaks in a very noisy signal

Question

0 voti

Hi, I have a large set of noisy signals, see examples below.

I want to calculate the mean frequency of the pulsations, and to do that I have to first numerically detect the peaks. I have an algorithm, but it relies heavily on empirical choice of parameters for the findpeaks function, and typically it takes me 3-4 tries before I get them right for a certain dataset.

Can someone point me in the right direction on how to make my algorithm more adaptive, or how to approach this differently? My code pasted below. Code in .mlx and test files added as an attachment.

filename = uigetfile("Test*.mat");
if ~(filename), return, end
load(filename,"dt","signal") 
time = dt*(1:length(signal));
halfM = 0.5*mean(signal);
% First find maxima separated by a multiple of time step, of a minimum height of 1% of the max value in the whole dataset
[pks, locs] = findpeaks(signal, time,"MinPeakDistance",5*dt,"MinPeakProminence",0.01*max(signal));
% Then find minima to make sure closely spaced, sharp peaks are differentiated in the next step
[min_pks, min_locs] = findpeaks(-signal, time,"MinPeakDistance",5*dt,"MinPeakProminence",0.01*max(signal));
% Combine minimum and maximum points
pks = [pks -min_pks]; locs = [locs min_locs];
% Sort the points for further processing
[locs, idx] = sort(locs); pks = pks(idx);
% Custom values for minimum separation and maximum peak width
minSep = 5e-9; maxPW = 1e6;
% Plot the results with peak prominences and widths marked
figure
findpeaks(pks, locs, "MinPeakProminence", halfM, "MinPeakHeight", 2*halfM, "MinPeakDistance", minSep, "MaxPeakWidth", maxPW,"Annotate","extents");
% Plot the original signal with all peaks and valleys marked in small orange dots, 
% and major peaks marked in large yellow dots
figure, plot(time, signal), hold on, scatter(locs, pks, Marker=".",LineWidth=1)
[pks,locs] = findpeaks(pks, locs, "MinPeakProminence", halfM, "MinPeakHeight", 2*halfM, "MinPeakDistance", minSep, "MaxPeakWidth", maxPW);
scatter(locs, pks, Marker="o", MarkerFaceColor="flat"), hold off
xlabel 'Time [\mus]', ylabel 'Signal power [a.u.]'

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Follow Question

Answer 1

Star Strider il 17 Set 2024

Apri in MATLAB Online

0 voti

Your signals have broadband noiise, so a frequency-selective filter is not going to apply here. The Savitzky-Golay filter (or wavelet denoising) are best suited for signals with broadband noise, so I miplemented it here. Change the ‘framelen’ value to give the best result. (I chose a 3-order polynomiial, since that generally works best imn my experience, however you can experiiment with that as well.)

Try this —

files = dir('*.mat');

for k = 1:numel(files)

LD = load(files(k).name);

Fs(k) = 1/LD.dt;

s{k} = LD.signal;

L = numel(s{k});

t{k} = linspace(0, L-1, L)/Fs(k);

end

figure

tiledlayout(numel(files),1)

for k = 1:numel(files)

nexttile

plot(t{k}, s{k})

grid

xlabel('Time')

ylabel('Amplitude')

title(string(files(k).name))

end

sgtitle('Original')

figure

tiledlayout(numel(files),1)

for k = 1:numel(files)

nexttile

[FTs1,Fv] = FFT1(s{k},t{k});

plot(Fv, abs(FTs1)*2)

grid

xlabel('Frequency')

ylabel('Magnitude')

xlim([0 5E+8])

title(string(files(k).name))

end

sgtitle('Fourier Transforms')

figure

tiledlayout(numel(files),1)

for k = 1:numel(files)

nexttile

framelen = 1+5E+3;

s_filt = sgolayfilt(double(s{k}), 3, framelen);

plot(t{k}, s_filt)

grid

xlabel('Time')

ylabel('Amplitude')

title(string(files(k).name))

end

sgtitle('Savitzky-Golay Filtered Signal')

function [FTs1,Fv] = FFT1(s,t)

% Arguments:

% s: Signal Vector Or Matrix

% t: Associated Time Vector

t = t(:);

L = numel(t);

if size(s,2) == L

s = s.';

end

Fs = 1/mean(diff(t));

Fn = Fs/2;

NFFT = 2^nextpow2(L);

FTs = fft((s - mean(s)) .* hann(L).*ones(1,size(s,2)), NFFT)/sum(hann(L));

Fv = Fs*(0:(NFFT/2))/NFFT;

% Fv = linspace(0, 1, NFFT/2+1)*Fn;

Iv = 1:numel(Fv);

Fv = Fv(:);

FTs1 = FTs(Iv,:);

end

.

9 Commenti
Mostra 7 commenti meno recenti Nascondi 7 commenti meno recenti

Vitek Stepien il 18 Set 2024

Since my signals have a lot of high-frequency content, I had to go way down with 'framelen'. I found values in the 30-50 range best.

You seem like an experienced DSP user, so I'll pick your brain if you don't mind. I tried doing FT of the raw signals, calculating the weighted mean frequency, akin to a center of mass calculation where every point is weighted by the distance (frequency). As expected, I got a mean time (as inverse of the mean frequency) that is shorter for TEST1 and TEST4, vs longer for 2 and 3 which makes sense just by visual inspection, since 1 and 4 need shorter windows to preserve signal characteristics. Does this method make any sense to you?

Also, I tried a range of window lengths and polynomial orders, and got very similar results at, for example, poly = 3, framelen = 31 and at poly = 9, framelen = 71. In that case, I would obviously choose the first configuration since it's going to be so much faster, right?

Thanks in advance for the discussion, I'm happily surprised that I got any answers, since this is not a simple problem and I wasn't sure if anyone would have the time and expertise to pick it up!

P.S. I wonder if ML could be useful here, but that's way beyond my current expertise.

Vitek Stepien il 18 Set 2024

I'm sorry, ML meant Machine Learning. I see this acronym a lot, but I should've made it clear.

Good point about the weighted mean frequency. I write a lot of code on my own because I like to fully understand and control what's going on, but sometimes I miss good built-in function because of that. I will check what results I get from meanfreq.

My ultimate objective is to estimate the pulsation frequency in the signals. It's really hard because, as you probably noticed, even in a single dataset I have a wide range of peak heights, peak widths, peak prominences; there are multi-peak formations... it's a headache. My current approach is to roughly detect peaks, calculate the distances between consecutive peaks, and estimate the pulsation frequency based on that.

I plot and analyze parameters such as peak height, peak width, peak prominence to estimate cutoff parameters for MATLAB's findpeaks function, but my biggest problem is that I always have to pick the parameters for each dataset manually. I can write an algorithm that works amazing for a certain dataset, but I'm having a hard time making it robust. I feel like with the huge difference between magnitudes of different peaks and their widths, I have to resort to a localized method, which performs a moving-window analysis of the signal first and pre-determines the values of parameters for the signal processing. I wish I could do a Savitzky-Golay filter that changes window size / polynomial order dynamically within one array of data, but my gut feeling is that even if I write a function that does that, it will take forever to execute.

Star Strider il 18 Set 2024

Apri in MATLAB Online

I still do not understand what your objective is, and the system you are measuring. Two options that may bee helpful are wavelet analysis, and using the emd (empirical mode decomposition) function (introduced in R2018a). (This is similar to independent components analysis for blind-source separation.) This occurred to me this morning, although it took me a while to remember what the function is, since I don’t use it often.

Try this —

files = dir('*.mat');

for k = 1:numel(files)

LD = load(files(k).name);

Fs(k) = 1/LD.dt;

s{k} = LD.signal;

L = numel(s{k});

t{k} = linspace(0, L-1, L)/Fs(k);

end

figure

tiledlayout(numel(files),1)

for k = 1:numel(files)

nexttile

plot(t{k}, s{k})

grid

xlabel('Time')

ylabel('Amplitude')

title(string(files(k).name))

end

sgtitle('Original')

figure

for k1 = 1:numel(files)

L = numel(s{k1});

sd = resample(s{k1}, L, fix(L*1E+1));

% sd = sum(sin((1 : 50 : 200).' * 2*pi*(0:1E+4-1)/1E5 + (1:4).' )).'

tr = linspace(0, numel(s{k1}), numel(sd)).'/Fs(k1);

[imf{k1},resid] = emd(sd);

szimf = size(imf{k1});

disp("imf size = "+szimf(1)+" rows, "+szimf(2)+" columns")

figure

tiledlayout(ceil(size(imf{k1},2)/2),2)

for k2 = 1:size(imf{k1},2)

nexttile

plot(tr, imf{k1}(:,k2))

grid

xlabel('Time')

ylabel('Amplitude')

title("imf(:,"+k2+")")

end

sgtitle("Empirical Mide Decomposition Analysis Of: "+string(files(k1).name))

end

imf size = 10000 rows, 10 columns

imf size = 10000 rows, 9 columns

imf size = 10000 rows, 10 columns

imf size = 10000 rows, 8 columns

This code saves the results in a cell array, and also illustrates how to index into it (in the plot calls). Columns 6 or 7 of the ‘imf’ matrix may be the results you want.

I doubt that it would be possible to improve on this result.

.

Star Strider il 18 Set 2024

With findpeaks, I generally use the 'MinPeakProminence' name-value pair to select only the most porminent signals. That neatly avoids problems with choosing the peak amplitudes ('MinPeakHeight') while still returning significant peaks in a region.

With respect to the pulses, if they have irregular intervals, modeling them as a Poisson process (poissfit) could be worth exploring.

I am pleased that the sgolayfilt function seems to be working for you. I use it frequently with signals having broadband noise. It essentially works as a FIR comb filter (visible by calculating the transfer function by dividing the Fourier transform of the filtered output by the Fourier transform of the input), however it’s generally easier to implement and is not sensitive to the signal length.

I’m currently reviewing independent components analysis in order to write a function to perform it. (At least one already exists in the File Exchange, however I want to understand how it works, and coding it myself is the best way to do that.) That would separate the various components of your signal in order to isolate the noiise.

I’m not certain what you’re doing, or whether the noise is part of the process or due to the instrumentation. If it’s instrumentation noise, using shielded (coaxial) leads, effectiive grounding and a reference lead may be ways to control it.

Vitek Stepien il 19 Set 2024

Modificato: Vitek Stepien il 19 Set 2024

Star Strider, it was a pleasure working with you. I did some more tweaking with findpeaks, sorting the pulses, trying to pick only the "right ones" but I don't seem to make a good progress. This is for my work, but that particular piece of analysis was more interesting to me personally than important in the professional sense, so I'm putting it on hold and moving on for now. Thanks for sharing your knowledge, I definitely learned something new! If I ever pick it up and make a significant progress, I'll update this thread.

P.S. This data is all generated with a numerical model, so there is no instrumentation impact at all. All of the broadband noise comes from the dynamics of the modeled process itself.

Star Strider il 19 Set 2024

Thank you!

As always, my pleasure!

I iintend to continue working on my independent components analysis function, since I believe it can be helpful here, and in other appications. When I get it up and running successfully, I will update that analysis here as well.

I wish you well in your research.

If I can be of any further help, please let me know.

Accedi per commentare.

Answer 2

Vitek Stepien il 18 Set 2024

0 voti

Let me explain my work a bit, but first comments on the previous answers:

I often have very noisy signals to analyze, and I use a bunch of different filtering techniques, but I wasn't familiar with the S-G filter. It works really well, I'm very happy with the quality-preserving smoothing I get, so thank you for introducing me to it!
I was vaguely familiar with EMD but didn't use it for a long time. I don't think it helps me a lot here.
I use wavelet transform for other aspects of my work, but I'm not sure if it would here. I believe it produces qualitatively similar results to the EMD, but in a more continuous format.

Now to what I'm doing: I'm analyzing the output of a nonlinear proces called stimulated Brillouin scattering, which is an optical phenomenon that manifest itself in a generation of an optical wave that, in the context of my work, propagates in the system in the direction opposite to the original signal. This generated wave is seeded by noise and amplified in a nonlinear fashion, producing not-quite-random signal that is full of noise, with more or less sporadic pulses in power. This power vs time is the data in the four test files. In the figure below, I marked three such pulses with arrows.

The problem is that these pulsations are not quite periodic, and the signal is not really zero at other times - it fluctuates all the time. The distance between the first two peaks is about 50 nanoseconds, and between the next two is about 65 nanoseconds, so they are pretty close. Visually inspecting a whole bunch of datasets I see that there is a regularity to the pulsations, and so I wanted to estimate the regularity. See below a sample histogram for one of the datasets:

As you can see, I can kind of calculate the mean time between consecutive pulsations, and inverse that to get an estimate of mean pulsation frequency. But I'm not happy with how the pulsations are detected, because depending on the algorithm I use, some of the are overlooked, others are counted multiple times because they have a more complex structure, etc. See example of my old algorithm's output below:

Yellow dots mark the pulsations detected by the algorithm. Now "eyeballing", I would say there are only two large pulses here, marked with the red arrows. Obviously, many more were detected, including the really small one on the right side, which skews the analysis. I'm trying to improve the algorithm so that it doesn't count the high-frequency spikes, or the tiny peaks on the sides.

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Accedi per commentare.

Finding peaks in a very noisy signal

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Risposta accettata

9 Commenti
Mostra 7 commenti meno recenti Nascondi 7 commenti meno recenti

Più risposte (1)

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Categorie

Prodotti

Release

Tag

Community Treasure Hunt

Finding peaks in a very noisy signal

0 Commenti Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Risposta accettata

9 Commenti Mostra 7 commenti meno recenti Nascondi 7 commenti meno recenti

Più risposte (1)

0 Commenti Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Categorie

Prodotti

Release

Tag

Vedere anche

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

9 Commenti
Mostra 7 commenti meno recenti Nascondi 7 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti