The difference of SSE and MAE curve fitting and optimization

I am trying to fit some series of data taken at different frequencies to a custom model. But it doen't get reasonable result and fit for some of data series. The data I have is a nonlinear system with multiple distinct peaks. And the fitting function is essentially a fraction where I do have multiplication of two exponential functions in both nominator and denominator and one of the exponential functions are an exponential function of a series of Gaussian functions where the fitting parameters are the amplitude and width of each Gaussian and the distance between the Gaussians and the last fitting parameter is the argument of the other one of the two exponential functions.
I have tried curve fitting tool matlab (used both Trust-region and levenberg-marquardt algorithms) and also optimization tool (along with multistart and globalsearch) where I tried to minimize the SSE between the actual data and the predicted value from the model. But none of them helped.
I am wondering is minimizing SSE a good measure for optimization in my case? Also, in curve fitting tool sse is one of the measures of the goosness of fit.
I know about another measure MAE (minimum of absolute errors) but I am not sure how it is defined properly and if minimizing it is more helpful than minimizing sse or not or essentially I will see the same things that I see for SSE.
I appreciate your comments and suggestions!

 Risposta accettata

In most curve-fitting problems, you will not see a big difference when you minimize SSE versus minmize sum of absolute errors (SAE). Minimization of SSE has the nice property that, if the errors are normally distributed, then minimizing the SSE produces a maximum likelihood estimate. Which is nice from a statistical viewpoint. Minimizing SAE has the advantage that it is more robust, that is, it is less skewed by outliers.
You can minimize SAE by defining a function that returns the SAE, then using fmincon() to find the parameters that minimize the SAE function.
I am attaching an example script that fits three parameters, by minimizing SSE and by minimizing SAE. The script includes an SSE function and an SAE function. fmincon() is used two times: first to find the parameters that minimize SSE, then to find the parameters that minimize SAE. Both sets of fitted parameters are displayed, along with the true parameter values. Here is an example of the console output produced by the script:
>> fitdata1Dexample
Fitting results:
Minimize SSE: Fitted a,b,c = -1.904 5.036 0.424
Minimize SAE: Fitted a,b,c = -1.855 5.043 0.373
True values: a,b,c = -2.000 5.000 0.500
You will get different results each time you run it, since the random noise is different every time.
The function to compute SAE is
%% SAE function
function SAE = sumabserr1D(params)
%SUMABSERR1D Sum of absolute error between data and model prediction
% y=vector of data
% a,b,c = model parameters
% x = locations at which the function model1D() should be evaluated
% model1D() = function describing the model
% Function sumabserr1D() is to be minimized in a model fitting routine
% WCRose 2022-05-08
global x y;
a=params(1);
b=params(2);
c=params(3);
SAE=sum(abs(y-model1D(x,a,b,c)));
end
where model1d() is defined in the script.
Good luck.

8 Commenti

@Shaily_T, you said the data you are fitting are taken at different frequencies and that there are multiple distinct peaks. This makes me wonder if the data you are fitting are values from a power spectrum. If you are fitting values from a power spectrum, you might be interested to know that, if the noise in the time domain is normally distirbuted, then the power spectrum values have a chi-squared distribution.
Also, I have found when fitting a peaky experimental spectrum (force generated by subjects who have tremor) that the Lorentzian function is a better fit than a Gaussian to the spectral peak. It is also known as the Cauchy distribution, when used in statistics. The theoretically expected shape of a spectral peak is Lorentzian, in many kinds of spectroscopy (here).
Thank you so much for your explanation and the attached code! So, if I want to have MAE it is the same as SAE but I should use mean instead of sum. Is that correct?
About your second comment actually I am not sure. I can elaborate more on what I have.
I have 15 series of experimental data taken at different frequencies. I am trying to fit each series of data which is taken at different frequency domains to a function and obtain the value of the fitting parameters (I have 4 fitting parameters) and use the obtained values of the fitting parameters to calculate the efficiency of the system by another equation and compare it with the experimental efficiency. The problem is for some of these series of data, I don't obtain a good fit and so the obtained efficiency is far different from the experiment. I think one problem is I see the fitted curve with better goodness of fit measures (i.e. SSE) is not the best fit in terms of what I see and also the calculated efficiency. I have also tried the optimization toolbox algorithms and solvers and I also tried to minimize SSE function there but it didn't help. So, I was thinking maybe I should minimize something else (i.e. MAE, SAE) instead of SSE. But from your example and explanation it seems to me it will not make a huge change.
I have attached two fitted curves to the same data with different values of SSE. From the obtained efficiency and what I see I think the one with worse SSE is a better fit.
Thanks for your time!
Yes, use mean instead of sum, if you prefer MAE. Changing from SAE to MAE will not affect the fit, because the parameters that minimize SAE will also minimize MAE, and vice versa.
It is interesting and a bit disappointing that the parameters that give the best fit lead to efficiency predicitons that are quite different from the experimental observations.
Be careful when evaluating a best fit by eyeballing the curves ("chi-by-eye"). I agree that the plot labelled "lower SSE" looks like a worse fit than the plot labelled "bigger SSE" - which is the opposite of what we expect. But remember that what is being fitted (I assume) is the vertical distance between each yellow point and each corresponding blue point. And it is impossible to tell, by looking at these figures, whether the sum of the squared sdistances between oints is worse in the first or second plot. The slight sideways offset between the curves in the "bigger SSE" plot produces extremely large errors in the vertical direction, but our eye sees the curves as close together, because they are so close horizontally. So we cannot trust chi-by-eye.
Thanks for your response!
Indeed, the interesting and weird part of it is it works for some of the frequencies and I get reasonable fits there but it is not working for some other frequencies.
So, even when the fit looks better for bigger SSE I should still rely on SSE. Is that correct? So, now I should find the reason why for the parameters that give the best fit the efficiency predicitons are quite different from the experimental observations.
What I have noticed in my fits is I do get inconsistent results (including this discrepancy for SSE) for some of data series . I tried 'Levenberg-Marquardt' algorithm in the curve fitting toolbox. But the result is changing each time I change the startpoints. Also, I do have 4 fitting parameters and everytime automatically it assumes two of them are fixed at startpoint and only fits the other two (It is not too bad because the two fixed parameters are more or less predictable from the fits). So, I thought it would be helpful to see the result of globalsearch and multistart in the optimization toolbox and I tried it by using different solvers and algorithms. However, I get close or similar results as the local solvers even by using multistart and globalsearch. One suggestion was to do a grid search for the global minima and so first I tried to plot the function that I want to optimize vs two of my fitting parameters (so, the other two are constant) to have a sense of what is going on. I have attached the obtained surfplot (SSE is the function to be minimized, a, d, sigma are parameters that I want to obtain by minimization). I am wondering do you think if a grid search would be helpful in my case?
I undrestand it is another question itself but I thought it might be nice if I know what are your thoughts on this.
I appreciate your time and comments!
[edited 5/23/2022: Corrected a typo in the code: replaced "~sumsqerr" with "@sumsqerr"]
Levenberg-Marquardt is a method for finding the minimum. Ideally, it ill not change the minimum that is found. It just may get there in a more or less efficient way. I wrote my own L-M routine in Pascal, based on the Numerical Recipes book, in the 80s. I suspect that Matlab's fmincon() is better at incorporating contrainsts of various types. Constraints may be hard to implement in standardard Levenberg-Marguardt. Matlab's fmincon() probably has other features which L-M may lack.
You are wise to be concerned about getting stuck in a local minimum. When I do multidimensional fitting, I always start from multiple points in the N-dimensional parameter space , to improve the odds tht I find the true global minimum, and reduce the odds of ending up in a local minimum which is not a global minimum. I would try 2^N or 3^N starting points, chosen near the corners (if 2^N) or near-corners plus midpoints (if 3^N) of the N-dimensional hypercube of parameter space. With N=4 this means 81 starting points. Do the fit 81 times and remember the output of each trial. The best fit is the one with the lowest MAE or SSE or whatever you are minimizing.
Example with N=4. I am fitting parameters a, b, c, d. The bounds are [amin,amax], [bmin,bmax], etc. I want to start at points that are at 10%, 50%, and 90% of each range:
N=4;
amin=0; amax=1; bmin=-2; bmax=2; cmin=0; cmax=10; dmin=-5; dmax=5;
a0=amin+[.1,.5,.9]*(amax-amin);
b0=bmin+[.1,.5,.9]*(bmax-bmin);
c0=cmin+[.1,.5,.9]*(cmax-cmin);
d0=dmin+[.1,.5,.9]*(dmax-dmin);
p0=zeros(3^N,N); %array for initial guesses
for i=1:3
for j=1:3
for k=1:3
for m=1:3
p0(m+3*(k-1)+9*(j-1)+27*(i-1),:)=[a0(i),b0(j),c0(k),d0(m)];
end
end
end
end
Code above creates a 81x4 array. Each of the 81 rows is a different starting point.
Display the first 3 and last 3 starting points:
disp(p0(1:3,:)); disp(p0(79:81,:))
0.1000 -1.6000 1.0000 -4.0000 0.1000 -1.6000 1.0000 0 0.1000 -1.6000 1.0000 4.0000 0.9000 1.6000 9.0000 -4.0000 0.9000 1.6000 9.0000 0 0.9000 1.6000 9.0000 4.0000
In this example, you would call fmincon() 81 times, using a different row from x0 as the start point, each time.
p=zeros(3^N,N); %allocate array for best-fit parameters from each trial
sse=zeros(3^N,1); %array for sum squared error from each trial
for i=1:3^N
[p(i,:),sse(i)]=fmincon(@sumsqerr,p0(i,:),[],[],[],[],[amin,bmin,cmin,dmin],[amax,bmax,cmax,dmax]);
end
Unrecognized function or variable 'sumsqerr'.
[ssebest,ibest]=min(sse);
fprintf('Best (lowest) SSE=%.3f\n',ssebest)
fprintf('Best parameters: %.3f, %.3f, %.3f, %.3f\n',p(ibest,:))
where sumsqerr() is a function written by you that computes the quantity which you want to minimize.
Try something like that.
Shaily_T
Shaily_T il 11 Mag 2022
Modificato: Shaily_T il 11 Mag 2022
I will try that!
So, I don't need to use multistart or globalsearch here. I should just use fmincon() with these several startpoints. Is that correct?
Thank you so much for your time, explanation and sharing this code! I appreciate it.

Accedi per commentare.

Più risposte (1)

I think using MSE or SSe would tend to find a fit to minimize real outliers and get closer to the outliers, while MAE (mean absolute error or median absolute error) would tend to fit better overall but may be way off at the outlier points. This is because if you square the difference, outliers far away from the fit have a much greater influence. But I could be wrong about that. If you don't have any really bad outliers, the fits may be really close to each other.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by