help me to do nonlinear fitting use lsqcurvefit

Questo/a domanda è stato/a segnalato/a da Mathieu NOE
  • Segnalato come Duplicato da Mathieu NOE il 19 Dic 2024.
please help me, I am begginer in matlab and thsi is my first time doing nonlinear fitting, I have problem to fit data with the model non linear fitting (here I use lsqcurvefit), the flow of my code is something like this
  • get autocorrelation function(ACF), isolate half of it
  • get envelop of ACF,
  • fit envelope of ACF with equation E=1/t^2 exp (-2*pi*f*Qc) to get optimum Qc (with f is my freq=3, t is time lapse)
here the ilustration from source that i use
Here is my code, but i don't really trust my self since the Qc value is not match with range (is uppose to get Wc between 100-400) but using my code i get thousand. Please is anyone can help me to clarify my code
clear; clc; %close all;
% Define the folder containing ACF files
folder_path = pwd; % Adjust the path to your ACF files
file_pattern = 'cBPCen3_s6h_ACF_HGSD_HHZ_2023-12_2023-12-01T00_00.txt'; % Match ACF filenames
files = dir(fullfile(folder_path, file_pattern));
% Define the model function for nonlinear fitting
f=3;
m=2;
model = @(Qc, t) (1 ./ t.^m).*exp(-2 * pi * f * t ./ Qc);
% Initial guess for Qc
Qc_initial = 110;
options = optimset('Display', 'off'); % Suppress output
% File to store Qc_fitted_envelope results
results_file = fullfile(folder_path, 'Qc_fitted_envelope_results.txt');
if exist(results_file, 'file')
delete(results_file); % Clear old results if file exists
end
% Loop over each ACF file
for i = 1:length(files)
% Load ACF file
file_name = files(i).name;
file_path = fullfile(folder_path, file_name);
acf = load(file_path);
% Extract the ACF data
y_obs = acf((length(acf)/2):(end))'; % Adjust this as needed
% Compute the squared envelope of the ACF
envel = envelope(y_obs,5,'peak')';
envel = envel(1:round(0.5 * length(envel)));
tenvel = (1:length(envel))';
% Perform nonlinear optimization for envelope
Qc_fitted_envelope = lsqcurvefit(@(Qc, tenvel) model(Qc, tenvel), Qc_initial, tenvel, envel, [], [], options);
y_fitted_envelope = model(Qc_fitted_envelope, tenvel);
rms_error_envelope = sqrt(mean((envel - y_fitted_envelope).^2));
% Plot results for each file - DEACTIVATE FOR BIG LOOPING
figure;
plot(tenvel/100,envel, 'k-', 'DisplayName', 'Envelope (Observed)');
hold on;
plot(tenvel/100, y_fitted_envelope, 'r-', 'LineWidth', 1.5, 'DisplayName', sprintf('Envelope (Fitted, RMS=%.4f)', rms_error_envelope));
xlabel('Time (s)');
ylabel('Amplitude');
title(sprintf('Envelope Fit - Qc = %.4f, RMS Error = %.4f', Qc_fitted_envelope, rms_error_envelope));
legend('Location', 'northeast');
%set(gca, 'YScale', 'log')
grid on;
% Save the envelope to a text file - DO YOU REALLY NEED SAVE THIS??
% output_file_name = ['envel_' file_name];
% output_file_path = fullfile(folder_path, output_file_name);
% save(output_file_path, 'envelope', '-ascii');
% fprintf('Envelope saved to %s\n', output_file_name);
% Save the Qc_fitted_envelope result to the text file
fid = fopen(results_file, 'a'); % Open in append mode
fprintf(fid, '%s\t%.4f\n', file_name, Qc_fitted_envelope);
fclose(fid);
% Display results in the command window
fprintf('File: %s\n', file_name);
fprintf(' Envelope: Fitted Qc = %.4f, RMS Error = %.4f\n\n', Qc_fitted_envelope, rms_error_envelope);
end

6 Commenti

duplicate post
why not simply continue the first one ?
and what about the suggestions we already made ?
I understand that you already help me with the previous question, but your answer isuggestion using damping equation instead of using the equation i mention, and my aim is not solely getting Qc but also want to fit data by those equation and also i want to know why my code has wrong
hello again and happy new year !
there are still some points that are unclear to me
  • acf time vector : your code assume a 1 second time interval (and also t start at t= 1 , why ?) :
tenvel = (1:length(envel))';
  • considering the senond half of data (after the central peak) you still have 2500 samples - so according to the look of your data once plotted , and compared to the posted image (time psan is 0 to 30 seconds) , I doubt that the time interval could be 1 second.
a second point is why the first plot (of A) shows a peak amplitude of autocorrelation which is not one ?
equation E=1/t^2 exp (-2*pi*f*Qc) assumes that ACF should be normalized I assume ?
maybe you should share the publication so we could see the larger context
hi @Mathieu NOE, happy new year too. Thanks for still reply in this post.
After read again my code, I realize that I put acf time vector without considering my sampling rate. Kindly check here my revision. parameter "part" i used to chose how much data that I want to fit in. 1 means 100percent data i used to fit, but eventhough i change it to 50% or 25%, the Qc that i get sems does not really change.
Related with second question
"a second point is why the first plot (of A) shows a peak amplitude of autocorrelation which is not one ? "
I also has no ide about that. the author does not explain it in detail. (the publication already mention by @William Rose below, thanks a lot)
the Please give me advise if something that strange in my code.

Accedi per commentare.

 Risposta accettata

I agree with the comments of @Mathieu NOE. He raises very good points - including the recommendation to share the source of the figure.
The source is https://academic.oup.com/gji/article/168/3/1029/2042341: Wegler & Sens-Schonfelder. 2007. "Fault zone monitoring with passive image interferometry. Geophys J Intern 168(3): 1029–1033. https://doi.org/10.1111/j.1365-246X.2006.03284.x
There are two things to note in order to get the fit to work:
1. The y-axis units are "arbitrary", and the authors say "the envelope E(t) should obey E(t) ∼ 1/t2  exp(−2πft/Qc)". The "~" above means "proportional to". Therefore, we must fit the unknown constant of proportionality. Thus the predicted y-value is
yPred=A*t.^(-2).*exp(-2*pi*f*t/Qc); where A and Qc are unknown, and f=3 Hz (specified in the text).
2. The figure shows E and fitted E on a logarithmic y-axis. I had unsatisfactory results when I used fmincon() and when I used fitdata() to estimate A and Qc by minimizing the sum squared error between y and yPred. Specifically, fmincon() and fitdata() stop on the initial quess, at least when the initial guess for [Qc,A] is [1,1]. This happens because the objective function (the sum squared error) is quite insensitive to changes in Qc and A. To fix this, I minimize the sum squared difference between log(y) and log(yPred). With this adjustment, the minimization routine gives good results, as shown in the plot below.
My fitted Qc=133; the authors get Qc=150. The difference may be due to the fact that I digitized data from a figure, and interpolated between my samples. Therefore the data I fitted does not exactly equal the data which Wegler & Sens-Schonfelder fitted.
Code to fit the data, and the file of points digitized from Figure 2B, are attached.

9 Commenti

In repsonse to your questions above:
The peak of the autocorrelation function (acf) in Figure 2A is not one because Wegler and Sens-Schonfelder (2007) report the acf in arbitrary units. They use arbitrary units because they set the seismogram to zero during and after earthquakes. (They do this because they are studying the "noise" in seismogram data, and earthquakes are not "noise".) They say "We found that setting certain time windows of the noise to zero has negligible influence on the shape of the ACF, whereas it does influence the amplitude of the ACF." Since their process alters the amplitude, they report the ACF in arbitrary units.
Regarding the time base, you have
t = linspace(1,length(E)/100,length(E)); % Time lapse
The code above results in a time vector that starts at t=1 s and has a sampling rate of approximaytely 102 Hz. I think your intention was to have a sampling rate of 100 Hz, which would match the sampling rate of Wegler and Sens-Schonfelder. Also, the time vector for the ACF should start at t=0, not t=1. Therefore I recommend you do
t = (0:length(E)-1)/100; % Time lapse, sampling rate=100 Hz
When you fit the data, fit from t=2 to t=20, like Wegler and Sens-Schonfelder in Figure 2B. This means you don't have to worry about the fact that the fitting function blows up at t=0. You can do this as follows:
tFit=t(t>=2 & t<=20);
EFit=E(t>=2 & t<=20);
...
Qc_estimated = lsqcurvefit(@(Qc,t) model(Qc,t), Qc_initial,tFit, EFit, 0, 350, options);
If you want to follow the example of Wegler and Sens-Schonfelder, then you should fit the square of the ACF, like they do. Do not fit the envelope obtained with envel = envelope(acfsqr,100,"peak");. The envelope obtained with envelope(acfsqr,100,"peak") is negative from about t=1.7 to t=2.7, and is negative from about t=5 to t=6.5. Negative values for the envelope do not make sense.
In your revised code, f=1. In your earlier code, f=3, which matches the value used by Wegler and Sens-Schonfelder. Why did you change f ? Changes in f will cause proportional changes in the fitted value for Qc.
When I modify your revised code to include the suggestions above, and use your data, I get the following results (see attached code):
The fit above uses f=1, as per your revised code. If you use f=3, thren fitted Qc=7.96.
The fit above does not include a scaling factor. The fit in my answer, earlier above, does include a scaling factor. The difference is because in my answer, I fitted the data of Wegler and Sens-Schonfelder (2007) (WSS), and your revised code fits your data. Your data is the true autocorrelation, or close to it. (The value is close to unity at 0 lags.) Therefore your ACF is in true units, so it does not need an arbitrary scaling factor.
The bottom plot shows that, on a logarithmic axis, the fit is poor for Time>5. To fix this, minimize the error beteeen log(ACF^2) and log(predicted ACF^2). I think WSS minimized the error between log(data) and log(model).
Your ACF^2 is quite different from the ACF^2 in Figure 2B of WSS. Therefore you should not expect to get the same results for your fit.
thanks for clearly explain, but let me clarify some :
I used f=1 because i applied bandpass filter with central frequency 1 (and the paper used 3, because they use central freq 3),
Regarding to square envelope, WSS mention that the compute square envelope of the ACF, by this reason I calcuate ACF, square it and then take envelope of it then calculate Qc based on it. Do you thing i following the wrong step at this part?
You write: "I used f=1 because i applied bandpass filter with central frequency 1 (and the paper used 3, because they use central freq 3)". Since WSS say they used a highpass filter with 2 Hz cutoff (see Section 2), I was unsure what the "central frequency" refers to. WSS 2007 say, in section 3, "the envelope E(t) should obey E(t) ∼ 1/t2 exp(−2π f t/Qc ), where t is lapse time, f is central frequency and Qc is the coda decay rate", and they cite Aki & Chouet (1975) (A&C). A&C Figure 4 (attached) shows that "center frequency" does in fact refer to the center of a bandpass filter, so I think your use of f=1 is correct.
WSS say in Section 3 "we computed the squared envelope of the ACF (Fig. 2b)". This statement, combined with the data in Fig 2b, make me think WSS define the "envelope" to be the squared ACF. I think the downward spikes in Fig 2b are where the un-squared ACF changes sign.
One can examine A&C to try to determine if the equation given by WSS is meant to describe the squared autocorrelation. The closest equation I see in A&C to the equation in WSS is A&C equation 27, p.3329, which is for the amplitude of the coda wave.
where a=m/2. If we assume the waves are body waves, then m=2 (see A&C p.3325, just after equation 9). If we square both sides and substitute for ω, we get
which matches the equation in WSS. Therefore I think it is appropriate to apply the equaiton in WSS to the square of the autocorrelation.

I also cheched those equation and not found the envelope terms. The might be explaination that i can think of is because Qc is measure the decay rate of signal therefore it is more esy to fit envelope than ACF. but of course it only my assumption.

dear all
@William Rose , thanks for the hard work and sharing the publication.
with your code and with the digitized data (now we have ACF squared envelope as y data) I confirm the optimization will converge to Q =133 (with f = 3)
this is basically the same code using fminsearch as in the initial post
% Import raw data, digitized from Figure 2B
% x axis : time
% y axis : ACF squared envelope
data=load('plot-data.csv');
% Interpolate to a constant sampling rate
tInt=(data(1,1):.02:data(end,1))';
yInt=interp1(data(:,1),data(:,2),tInt);
% Fit the data from t=2 to t=20.
t=tInt(tInt>=2 & tInt<=20);
y=yInt(tInt>=2 & tInt<=20);
%% fminsearch
% Define the model function for nonlinear fitting
f=3;
m=2;
% ic guess
A = 10;
Q = 100;
model = @(A, Qc, t) (A ./ t.^m).*exp(-2 * pi * f * t ./ Qc);
%
obj_fun = @(params) norm(log(model(params(1),params(2),t))-log(y));
IC = [A, Q];
sol = fminsearch(obj_fun, IC);
A = sol(1)
A = 56.5133
Q = sol(2)
Q = 133.4454
modelfit = model(A, Q, t);
Rsquared = my_Rsquared_coeff(y(:),modelfit(:));
% y linear display
figure(1),plot(t,y,t,modelfit)
title(['Data fit , R² = ' num2str(Rsquared)]);
xlabel('time [s]'); ylabel('amplitude')
legend('ACF squared','model fit');
% y log display
figure(2),semilogy(t,y,t,modelfit)
title(['Data fit , R² = ' num2str(Rsquared)]);
xlabel('time [s]'); ylabel('amplitude')
legend('ACF squared','model fit');
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function Rsquared = my_Rsquared_coeff(data,data_fit)
% R2 correlation coefficient computation
% The total sum of squares
sum_of_squares = sum((data-mean(data)).^2);
% The sum of squares of residuals, also called the residual sum of squares:
sum_of_squares_of_residuals = sum((data-data_fit).^2);
% definition of the coefficient of correlation is
Rsquared = 1 - sum_of_squares_of_residuals/sum_of_squares;
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
I could not get any correct Q values with the data provided in posted file : cBPCen1_s2D1D_ACF_HGSD_HHZ_.HHZ-.HHZ_2023-12_2023-12-01T00_00.txt
Simply the ACF data looks very different from the digitized plot. Wonder how this was obtained and how this compares to the data used in the publication
I get bizarre results with squared acf or squared envelop in both cases.
FYI :
y = readmatrix('cBPCen1_s2D1D_ACF_HGSD_HHZ_.HHZ-.HHZ_2023-12_2023-12-01T00_00.txt');
%% load second half of data
[val,ind] = max(y);
y = y(ind:end); % second half of signal after the peak
Fs = 100; % sampling rate
dt = 1/Fs; %
t = 0 + dt*(0:numel(y)-1)'; % time start at t=0 or t = 1 ?
% envelop
[env] = env_secant(t, abs(y), 25, 'top'); % option with env_secant (see function attached)
%% acf / env squared
ysqr = y.^2;
% ysqr = env.^2;
%% model fit on data extract
ind = 2<t & t<20;
t2 = t(ind);
ysqr2 = ysqr(ind);
%% fminsearch
% Define the model function for nonlinear fitting
A = 1e-2;
f=3;
m=2;
Q = 100; % ic guess
model = @(A, Qc, t) (A ./ t.^m).*exp(-2 * pi * f * t ./ Qc);
%
obj_fun = @(params) norm(log(model(params(1),params(2),t2))-log(ysqr2));
IC = [A, Q];
sol = fminsearch(obj_fun, IC);
A = sol(1);
Q = sol(2);
modelfit = model(A, Q, t2);
Rsquared = my_Rsquared_coeff(ysqr2(:),modelfit(:));
% y linear display
figure(1),plot(t,ysqr,t2,modelfit)
title(['Data fit , R² = ' num2str(Rsquared)]);
xlabel('time [s]'); ylabel('amplitude')
legend('ACF squared','model fit');
% y log display
figure(2),semilogy(t,ysqr,t2,modelfit)
title(['Data fit , R² = ' num2str(Rsquared)]);
xlabel('time [s]'); ylabel('amplitude')
legend('ACF squared','model fit');
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function Rsquared = my_Rsquared_coeff(data,data_fit)
% R2 correlation coefficient computation
% The total sum of squares
sum_of_squares = sum((data-mean(data)).^2);
% The sum of squares of residuals, also called the residual sum of squares:
sum_of_squares_of_residuals = sum((data-data_fit).^2);
% definition of the coefficient of correlation is
Rsquared = 1 - sum_of_squares_of_residuals/sum_of_squares;
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Hy @Mathieu NOE, may I know why usec "A = 1e-2" instead of put A=1 or in previous code A = 10; ?
I calculate ACF using yam package (https://yam.readthedocs.io/en/latest/) using different stacking and bandpass filter compare to the data used in WSS publication. I used lower stacking (in paper they mention it used 20 day stacking whilw i only used 2 day stacking with 1 day stacking overlap) because I have shorter time span data also filter with central freq 1 (that why for cBPCen1_s2D1D_ACF_HGSD_HHZ_.HHZ-.HHZ_2023-12_2023-12-01T00_00.txt data , f should be 1 instead of 3)
I am not sure if this is causing the probelm such as bizzare resut as you mention.
@Mathieu NOE, you're welcome, and thank you for showing how to do the minimizaiton with fminsearch(). Thank you for showing that, when using data digitized from the published figure 2B, you get a fit like the fit I found (Qc=133). Your use of norm() as the objective function is more elegant than what I did. The value r^2=0.28 for the fit is rather low. One could argue that, since we fitted the published data by minimizing the differences of the log of the data and the log of the model, we should use the logs when computing r^2. I bet that if we did that, r^2 for the fit to the published data would be larger.
@Mathieu NOE, thank you for attaching the interesting and useful function env_secanrt.m.
@Mathieu NOE says "the ACF data looks very different from the digitized plot" I agree. Here is a plot comparing the squared autocorrelation of W. & S.-S., 2007 to the squared autocorrelation of nirwana.
data1=load('cBPCen3_s6h_ACF_HGSD_HHZ_2024-04_2024-04-05T12_00.txt');
acf1=data1(5001:end);
acf1sq=acf1.^2;
t1=(0:length(acf1)-1)'/100;
data2=load('plot-data.csv');
t2=data2(:,1); acf2sq=data2(:,2);
semilogy(t1,acf1sq,'-r.',t2,acf2sq,'-bx')
title('Seismogram ACF^2'); xlabel('Time (s)'); ylabel('ACF^2')
legend('nirwana','W&SS, 2007'); grid on
@Mathieu NOE and I found Qc=133 when fitting the log of the WSS data from 2 to 20 Hz. This Qc is close to the value Qc=150 reportsed by WSS 2007. @Mathieu NOE says "I get bizarre results with squared acf or squared envelope in both cases", when fitting the data supplied by @nirwana. When I fit the ACF^2 data of @nirwana, I get the following:
% Fit the nirwana data from t=2 to t=20.
t1fit=t1(t1>=2 & t1<=20);
acf1sqfit=acf1sq(t1>=2 & t1<=20);
f=1; % center frequency of bandpass filter used to record seismogram (Hz)
fnc = @(p)sumsqerr(p,t1fit,acf1sqfit,f);
p0=[1,1]; % initial guess for [Qc,A]
p=fmincon(fnc,p0,[],[],[],[],[1e-3,0],[1e4,1e6]);
Local minimum possible. Constraints satisfied. fmincon stopped because the size of the current step is less than the value of the step size tolerance and constraints are satisfied to within the value of the constraint tolerance.
Qc=p(1);
A=p(2);
fprintf('Qc=%.2g, A=%.2g.\n',p)
Qc=99, A=0.0035.
% Plot ACF^2 from nirwana and ACF^2 predicton from the fit
yPred=A*t1fit.^(-2).*exp(-2*pi*f*t1fit/Qc);
figure; semilogy(t1,acf1sq,'-r.',t1fit,yPred,'-g.')
title(['ACF^2, Qc=',num2str(Qc,'%.1f')]);
xlabel('Time (s)'); ylabel('ACF^2')
legend('nirwana','Model Fit'); grid on
The fitted value Qc=99 is within the bounds of reason. The fit includes adjustable parameter A, a scaling factor.
yep
definitively there is a big difference between the two data sets . No just a magnitude factor (that would simply lead to different A values) but the shape / slope are not the same. There must be a reason why, but this has probably to be searched in the process used to generate the data.

Accedi per commentare.

Più risposte (0)

Categorie

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by