Why I am unable to recreate curve fitting equation?

18 visualizzazioni (ultimi 30 giorni)
Sunil
Sunil il 31 Mag 2020
Modificato: John D'Errico il 1 Giu 2020
I used MATLAB's inbuilt Curve Fitting Tool to fit following data:
x = [5 10 15 20 25 30 35 40 45 50]
and
y = [140 88 62 49 38 31 25 20 17 12]
I used two term exponential equation to generate fitting curve.
Following results were obtained:
General model Exp2:
f(x) = a*exp(b*x) + c*exp(d*x)
where x is normalized by mean 27.5 and std 15.14
Coefficients (with 95% confidence bounds):
a = 0.2758 (-0.1069, 0.6585)
b = -3.521 (-4.346, -2.696)
c = 34.03 (32.91, 35.15)
d = -0.6419 (-0.6992, -0.5846)
Goodness of fit:
SSE: 3.376
R-square: 0.9998
Adjusted R-square: 0.9996
RMSE: 0.7501
I recreated the equation of the curve using the same coefficients a = 0.2758, b = -3.521, c = 34.03 and d = -0.6419 in equation y1 = a*exp(b*x) + c*exp(d*x) and run it in the command window I get following out put of y1 :
y1 =
1.374022395352651
0.055478622562340
0.002240049060184
0.000090446005331
0.000003651919963
0.000000147452830
0.000000005953673
0.000000000240390
0.000000000009706
0.000000000000392
I am unable to understand why there is such a big mismatch in y1 and y ?
  1 Commento
John D'Errico
John D'Errico il 1 Giu 2020
Modificato: John D'Errico il 1 Giu 2020
READ MY ANSWER TO THE END. It solves your problem, showing what you did, and reproducing the garbage numbers you got for y1, pretty much exactly.
Essentially, the problem you have in producing y1 is IF you do a fit using a normalized version of x in fit, then you need to build that normaization into your model. It is now part of your model.
The proof is that when I did the fit using the normalized version of x, it produces the same coefficients you got. So the problem is NOT in the fit itself, because i can then predict y pretty accurately, even if I use only the approximate set of coefficients as did you.
However, when you then predict the model, you need to use the narmalization used for the fit!
The problem is NOT how you estimated the model.

Accedi per commentare.

Risposte (2)

Star Strider
Star Strider il 31 Mag 2020
I have no idea what the problem is. The fminsearch function had no probllem with it.
The Code —
f = @(b,x) b(1).*exp(b(2).*x) + b(3).*exp(b(4).*x);
x = [5 10 15 20 25 30 35 40 45 50];
y = [140 88 62 49 38 31 25 20 17 12];
B = fminsearch(@(b) norm(y - f(b,x)), rand(4,1));
figure
plot(x, y, 'p')
hold on
plot(x, f(B,x), '-r')
hold off
grid
text(27, 100, sprintf('a = %7.3f\nb = %7.3f\nc = %7.3f\nd = %7.3f',B))
The Plot —
  3 Commenti
Star Strider
Star Strider il 31 Mag 2020
I do not have the Curve Fitting Toolbox because the others that I have (Statitistics and Machine Learning Toolbox, Optimization Toolbox) plus others, and my own mathematical and programming experience do everything I want.
Other than that, we know only what you said you did, not what you actually did. It is not possible to determine the problem. (I reversed the two vectors and my function still ran without error. The fit was appropriate and the parameters were different, however they did not even closely resemble the parameters you previously reported, eliminating that as a source of the problem.)
My code gives the correct result. Use my ‘f’ function with nlinfit and nlparci to get equivalent resultls, with confidence intervals.
Alex Sha
Alex Sha il 1 Giu 2020
The results below seem to be more better:
Root of Mean Square Error (RMSE): 0.581032486515321
Sum of Squared Residual: 3.37598750386176
Correlation Coef. (R): 0.999881471978295
R-Square: 0.999762958005483
Parameter Best Estimate
---------- -------------
a 165.438803583087
b -0.232608352963485
c 109.208242653762
d -0.0424020153019308

Accedi per commentare.


John D'Errico
John D'Errico il 1 Giu 2020
Modificato: John D'Errico il 1 Giu 2020
I had to play with this for a while, because my first assumption was that you were using the wrong coefficients. In fact, while that costs you some, it is not what destroyed your results. That can down to forgetting to use the normalized variable x in your computation. You CANNOT use a normalized x in the fit, but then not use the same normalization to predict y.
In fact, using 4 digit approximations is a classic problem. People think that a 4 digit approximation to a coefficient is the coefficient. It is not. Just because you see the number reported to 4 significant digits, it does not stop there.
My initial assumption is the problem you had is NOT the software used to estimate the model, but nothing more than using the wrong coefficients.
format long g
>> mu = mean(x)
mu =
27.5
>> S = std(x)
S =
15.1382517704875
>> xhat = (x - mu)/S;
>> mdl = fit(xhat',y','exp2')
mdl =
General model Exp2:
mdl(x) = a*exp(b*x) + c*exp(d*x)
Coefficients (with 95% confidence bounds):
a = 0.2758 (-0.1069, 0.6585)
b = -3.521 (-4.346, -2.696)
c = 34.03 (32.91, 35.15)
d = -0.6419 (-0.6992, -0.5846)
As you should see, these are exactly the same set of coefficients you claim to have gotten.
plot(x,mdl(xhat))
hold on
plot(x,y,'ro')
Again, those 4 significant digit approximations to the coefficients are NOT the coefficients. You always need to use the true values as estimated.
mdl.a
ans =
0.275764176155343
>> mdl.b
ans =
-3.52133177155047
>> mdl.c
ans =
34.0286408362909
>> mdl.d
ans =
-0.641895329124188
You need to use the full precision. And make sure you use the correct value for the normalizations too. Don't use a 4 digit approximation. If you do, then expect to get what is potentially random crapola.
I would have gotten the correct result also had I done this as:
ypred = mdl.a*exp(mdl.b*(x - mu)/S) + mdl.c*exp(mdl.d*(x - mu)/S);
In fact, this will give exactly the same predictions, as I claim it must. This I can verify.
norm(ypred' - mdl((x - mu)/S))
ans =
1.4210854715202e-14
To prove the problem is, in the end, just 4 digit approximations to the coefficients, let me now try doing exactly that.
aappr = 0.2758;
bappr = -3.521;
cappr = 34.03;
dappr = -0.6419;
Sappr = 15.14;
muappr = 27.5;
yappr = aappr*exp(bappr*(x - muappr)/Sappr) + cappr*exp(dappr*(x - muappr)/Sappr);
However, when I plot that 4 digit approximation, I still get something that is not too far off. However, As you see, I got exactly the correct fit, because I did my fit the same way you did, by fitting using a normalized version of x.
Now, let me compute the prediction, but NOT using the normalized version of x. After all, you computed it using a NORMALIZED X!!!!!!!
ywrong = aappr*exp(bappr*x) + cappr*exp(dappr*x);
When you computed y1, you did not use the normalized version of the vector x. Now, let me show the results. LOOK CAREFULLY AT THE COLUMNS.
format short g
[y',ypred',yappr',ywrong']
ans =
140 140.05 139.99 1.374
88 87.627 87.612 0.055479
62 62.864 62.861 0.00224
49 48.347 48.347 9.0446e-05
38 38.327 38.328 3.6519e-06
31 30.76 30.762 1.4745e-07
25 24.807 24.809 5.9537e-09
20 20.044 20.046 2.4039e-10
17 16.207 16.209 9.7062e-12
12 13.109 13.11 3.919e-13
Column 1 is the real data.
Column 2 are my predictions using the correct set of coefficients.
Column 3 is my predictions using the incorrect set of coefficients. As you can see, while it is incorrect, the differential is not as large as what you reported. In fact, surprsingly, it is not that far off. There are relatively small errors, but not huge errors.
Column 4 is what happens if you use the UNNORMALIZED VERSION OF X.

Categorie

Scopri di più su Get Started with Curve Fitting Toolbox in Help Center e File Exchange

Prodotti


Release

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by