Fittype and fit do not work as expected (in a curve fitting)

Question

Sim il 30 Ott 2024 alle 13:42

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/2163055-fittype-and-fit-do-not-work-as-expected-in-a-curve-fitting

Commentato: Sim il 31 Ott 2024 alle 10:31

Let's consider the following curve fitting of a set of data, by using the fittype and fit functions:

% inputs
x = [1 4 9 15 25 35 45 55 65 75 85 95 150 250 350 450 900];
y = [0 335 28 37 9 4 3.5 2 1 0.7 0.6 0.5 0.3 0.1 0.01 0.005 0.001];
% curve fitting
ft = fittype('a*(x^n)','dependent','y','independent','x');
[fitresult, ~] = fit(x',y',ft,'Startpoint', [0 0]);
% plots
hold on
p1 = plot(x,y,'o-','LineWidth',1.5,'color','b','DisplayName','inputs');
p2 = plot(x,(fitresult.a .* (x.^fitresult.n)),'-','LineWidth',1.5,'color','k','DisplayName','best fit');
set(gca, 'XScale', 'log', 'YScale', 'log');
legend([p1 p2])

However if I perform the curve fitting with EzyFit, I get the following (correct) result, as expected:

Why don't fittype and fit functions work correctly in this example?

Did I use those two functions in a wrong way?

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

John D'Errico il 30 Ott 2024 alle 14:12

2
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/2163055-fittype-and-fit-do-not-work-as-expected-in-a-curve-fitting#answer_1539370

Modificato: John D'Errico il 30 Ott 2024 alle 14:23

Apri in MATLAB Online

x = [1 4 9 15 25 35 45 55 65 75 85 95 150 250 350 450 900];
y = [0 335 28 37 9 4 3.5 2 1 0.7 0.6 0.5 0.3 0.1 0.01 0.005 0.001];

Look CAREFULLY at your data!

What is y(1)? Do you see that it is tremendously out of character, inconsistent with the rest of your data?

y(1)
ans = 0

Of course, when you plot it on a log scale, MATLAB does not plot that point, because log(0) is undefined. (-inf if you insist.) HOWEVER, when you do a fit, fit sees it!

And what that does is completely drag your curve around. y(1) = 1000 or larger might be more consistent. But not zero.

ft = fittype('a*(x^n)','dependent','y','independent','x');

Next, starting ANY optimization at [0,0] is a HIGHLY, HUGELY dangerous thing to do.

[fitresult, ~] = fit(x(2:end)',y(2:end)',ft,'Startpoint', [1 -1])

fitresult =

General model: fitresult(x) = a*(x^n) Coefficients (with 95% confidence bounds): a = 1.222e+04 (4056, 2.038e+04) n = -2.596 (-3.072, -2.12)

p1 = plot(x,y,'o-','LineWidth',1.5,'color','b','DisplayName','inputs');

hold on

p2 = plot(x,(fitresult.a .* (x.^fitresult.n)),'-','LineWidth',1.5,'color','k','DisplayName','best fit');

set(gca, 'XScale', 'log', 'YScale', 'log');

legend([p1 p2])

hold off

Is the fit very good, when plotted on a log scale? Well, not really. But it never is a good idea to do this sort of thing. Why not? Because MATLAB will not worry about TINY deviations in those lower points, even though in proportional terms, it will be very obvious on a log scale.

What you should see is the fit fits the first two points quite well, and then misses the rest. But again, think about what a log scale does!!!!!

fit assumes an additive error model, with the errors presumed to be homogeneous, and additive. That is a bad thing when your dats spans 4 or 5 powers of 10. It would be far smarter to log your data and your model, THEN do the fit. Now instead of proportional error, you will have additive error. See how nice it gets?

When you log your model, the problem becomes trivial.

Your model: y = a*x^n * (proportional noise)

Logged model: log(y) = log(a) + n*log(x) + (additive homogeneous noise)

We still need to drop the bogus first data point of course. But now your model is just a linear 'poly1' model.

mdl = fit(log(x(2:end)'),log(y(2:end)'),'poly1')
mdl = 
     Linear model Poly1:
     mdl(x) = p1*x + p2
     Coefficients (with 95% confidence bounds):
       p1 =      -2.292  (-2.525, -2.059)
       p2 =       9.469  (8.431, 10.51)

Don't forget to exponentiate the coefficient out front!

a = exp(mdl.p2)
a = 1.2953e+04
n = mdl.p1
n = -2.2920

Or, we could use fittype

ft = fittype('log(a) + n*log(x)','dependent','y','independent','x');

Note that I logged x inside fittype, but not y.

[fitresult, ~] = fit(x(2:end)',log(y(2:end)'),ft,'Startpoint', [1 -1])
fitresult = 
     General model:
     fitresult(x) = log(a) + n*log(x)
     Coefficients (with 95% confidence bounds):
       a =   1.295e+04  (-488, 2.639e+04)
       n =      -2.292  (-2.525, -2.059)

And you see this generates the same result for the parameters.

8 Commenti
Mostra 6 commenti meno recentiNascondi 6 commenti meno recenti

John D'Errico il 30 Ott 2024 alle 16:27

Modificato: John D'Errico il 30 Ott 2024 alle 16:45

Apri in MATLAB Online

Yeah, sometimes, the world just hates you, no? It just won't let you win. ;-)

When there is an additive constant in the problem, AND y varies over many orders of magnitude, and you have proportional noise, then you will have difficulties estimating that additive constant well. But there will be several things you can do. The best is probably to use weights. I'll use them on your own problem, and compare the results.

x = [4 9 15 25 35 45 55 65 75 85 95 150 250 350 450 900];
y = [335 28 37 9 4 3.5 2 1 0.7 0.6 0.5 0.3 0.1 0.01 0.005 0.001];

Note that, depending on how the weights are employed in different codes, you may or may not need to square y below.

w = 1./y.^2;

ft = fittype('a*(x^n)','dependent','y','independent','x');

[fitresult, ~] = fit(x',y',ft,'Startpoint', [1,-1])

fitresult =

General model: fitresult(x) = a*(x^n) Coefficients (with 95% confidence bounds): a = 1.222e+04 (4056, 2.038e+04) n = -2.596 (-3.072, -2.12)

[fitresultw, ~] = fit(x',y',ft,'Startpoint', [1 -1],'weights',w)

fitresultw =

General model: fitresultw(x) = a*(x^n) Coefficients (with 95% confidence bounds): a = 7662 (1154, 1.417e+04) n = -2.284 (-2.465, -2.104)

mdl = fit(log(x'),log(y'),'poly1')

mdl =

Linear model Poly1: mdl(x) = p1*x + p2 Coefficients (with 95% confidence bounds): p1 = -2.292 (-2.525, -2.059) p2 = 9.469 (8.431, 10.51)

a = exp(mdl.p2);

n = mdl.p1;

hold on

p1 = plot(x,y,'o','LineWidth',1.5,'color','b','DisplayName','inputs');

p2 = plot(x,(fitresult.a .* (x.^fitresult.n)),'-','LineWidth',1.5,'color','k','DisplayName','Unweighted');

p3 = plot(x,(fitresultw.a .* (x.^fitresultw.n)),'-','LineWidth',1.5,'color','r','DisplayName','Weights');

p4 = plot(x,(a .* (x.^n)),'-','LineWidth',1.5,'color','g','DisplayName','Logged model');

set(gca, 'XScale', 'log', 'YScale', 'log');

grid on

legend([p1 p2 p3 p4])

A subtly different result from the logged model, but not unreasonable either, and probably either is as good as the data merits. The best answer to my own eye might be mid-way between the logged model fit, and the weighted fit (the red and green lines.)

Anyway, weights would be my choice of a solution for problems with such an additive parameter in the model, and logging the model will not resolve the issue..

Could you do it in other ways when you have an additive parameter? Well, surely yes. For example, in this model:

y = (a*(x^n) + b )*(Proportional noise)

Again, we can log the model. It leaves us with perhaps a subtle problem, but one we can handle.

ft = fittype('log(a*x^n + b)',...
mdl = fit(x,log(y),...

What is the subtle problem? I would guess the b parameter will be difficult to estimate well, because changes in b will do very little to impact y on most data points. But that is not so much a problem in the way I would choose to solve it, but a problem implicit in the model itself.

Sim il 30 Ott 2024 alle 20:49

Modificato: Sim il 30 Ott 2024 alle 21:14

Apri in MATLAB Online

Thanks for your further explanation, which I found very interesting!

I was thinking to explore the curve fitting with the additional parameter (the y-intercept), i.e.

ft = fittype('log(a*x^n + b)',...
mdl = fit(x,log(y),...

and see what are the differences with the same equation without the additional parameter, i.e.:

ft = fittype('log(a) + n*log(x)',...
mdl = fit(x,log(y),...

Although "fittype('log(a) + n*log(x)',..." works, as we have seen....

x = [4 9 15 25 35 45 55 65 75 85 95 150 250 350 450 900];
y = [335 28 37 9 4 3.5 2 1 0.7 0.6 0.5 0.3 0.1 0.01 0.005 0.001];
ft = fittype('log(a*x^n)','independent','x');
[fitresult, gof] = fit(x',log(y)',ft,'Startpoint', [1 -1]);
hold on
plot(x,y,'o-')
plot(x,(fitresult.a .* (x.^fitresult.n)),'-')
set(gca, 'XScale', 'log', 'YScale', 'log');

the same curve fitting, with the additional parameter, does not work anymore...

x = [4 9 15 25 35 45 55 65 75 85 95 150 250 350 450 900];
y = [335 28 37 9 4 3.5 2 1 0.7 0.6 0.5 0.3 0.1 0.01 0.005 0.001];
ft = fittype('log(a*x^n + b)','independent','x');
[fitresult, gof] = fit(x',log(y)',ft,'Startpoint', [1 -1 1]);
hold on
plot(x,y,'o-')
plot(x,(fitresult.a .* (x.^fitresult.n)),'-')
set(gca, 'XScale', 'log', 'YScale', 'log');

...leading to this error....

Error using fit>iFit
Complex value computed by model function, fitting cannot continue.
Try using or tightening upper and lower bounds on coefficients.
Error in fit (line 117)
[fitobj, goodness, output, convmsg] = iFit( xdatain, ydatain, fittypeobj, ...
Error in untitled6 (line 15)
[fitresult, gof] = fit(x',log(y)',ft,'Startpoint', [1 -1 1]);

I do not know if the 'Startpoint' can be a source for this error... maybe, I could open a new question, if I am not able to solve it soon :-)

Again, many thanks for your great help! I am confident other people will find your solutions and explanations very helpful :-)

John D'Errico il 30 Ott 2024 alle 23:21

Modificato: John D'Errico il 30 Ott 2024 alle 23:24

Apri in MATLAB Online

Look carefully at the model. What happens if fit decides to try a negative value of b in that model? What if the value is sufficiently negative? What is the log of a negative number? (complex.) Now look back at the error message generated.

ft = fittype('log(a*x^n + b)','independent','x');

[fitresult, gof] = fit(x',log(y)',ft,'Startpoint', [1 -1 1]);

What this means is IF you want to try that model (I'd strongly suggest using weights instead!) then you need to constrain the parameter b. Fit does allow bound constraints on the parameters, I think. Its been a while since I looked. A quick glance however tells me that fitoptions does help you.

help fitoptions
 fitoptions - Create or modify fit options object
    This MATLAB function creates the default fit options object fitOptions.

    Syntax
      fitOptions = fitoptions

      fitOptions = fitoptions(libraryModelName)
      fitOptions = fitoptions(libraryModelName,Name,Value)

      fitOptions = fitoptions(fitType)

      fitOptions = fitoptions(Name,Value)

      newOptions = fitoptions(fitOptions,Name,Value)
      newOptions = fitoptions(options1,options2)

    Input Arguments
      libraryModelName - Library model to fit
        character vector | string scalar
      fitType - Model type to fit
        fittype
      fitOptions - Algorithm options
        fitoptions
      options1 - Algorithm options to combine
        fitoptions
      options2 - Algorithm options to combine
        fitoptions

    Name-Value Arguments
      Options for All Fitting Methods
        Normalize - Option to center and scale data
          'off' (default) | 'on'
        Exclude - Points to exclude from fit
          expression | index vector | logical vector | empty
        Weights - Weights for fit
          [ ] (default) | vector
        Method - Fitting method
          'None' (default) | 'NearestInterpolant | 'LinearInterpolant' |
          'PchipInterpolant' | CubicSplineInterpolant' | ...

      Interpolation Options
        ExtrapolationMethod - Extrapolation method
          "auto" (default) | "none" | "linear" | "nearest" | "thinplate" |
          "biharmonic" | "pchip" | "cubic"

      Smoothing Options
        SmoothingParam - Smoothing parameter
          scalar value in the range (0,1)
        Span - Proportion of data points to use in local regressions
          0.25 (default) | scalar value in the range (0,1)

      Linear and Nonlinear Least-Squares Options
        Robust - Robust linear least-squares fitting method
          'off' (default) | 'LAR' | 'Bisquare'
        Lower - Lower bounds on coefficients to be fitted
          [ ] (default) | vector
        Upper - Upper bounds on coefficients to be fitted
          [ ] (default) | vector

      Nonlinear Least-Squares Options
        StartPoint - Initial values for coefficients
          [ ] (default) | vector
        Algorithm - Algorithm to use for fitting procedure
          "Trust-Region" (default) | "Levenberg-Marquardt"
        DiffMaxChange - Maximum change in coefficients for finite difference gradients
          0.1 (default)
        DiffMinChange - Minimum change in coefficients for finite difference gradients
          10^{–8} (default)
        Display - Display option in the Command Window
          'notify' (default) | 'final' | 'iter' | 'off'
        MaxFunEvals - Maximum number of evaluations of model allowed
          600 (default)
        MaxIter - Maximum number of iterations allowed for fit
          400 (default)
        TolFun - Termination tolerance on model value
          10^{–6} (default)
        TolX - Termination tolerance on coefficient values
          10^{–6} (default)

    Output Arguments
      fitOptions - Algorithm options
        options object
      newOptions - New algorithm options
        options object

    Examples
      openExample('curvefit/ModifyDefaultFitOptionstoNormalizeDataExample')
      openExample('curvefit/CreateDefaultFitOptionsforGaussianFitExample')
      openExample('curvefit/SetPolynomialFitOptionsExample')
      openExample('curvefit/CreateFitOptionsforLinearLeastSquaresExample')
      openExample('curvefit/SpecifyExtrapolationMethodForInterpolantFitsExample')
      openExample('curvefit/ReuseFitOptionsInMultipleFitsExample')
      openExample('curvefit/FindandChangetheSmoothingFitOptionExample')
      openExample('curvefit/ApplyCoefficientBoundstoImproveGaussianFitExample')
      openExample('curvefit/CopyandCombineFitOptionsExample')
      openExample('curvefit/ChangeCustomModelFitOptionsExample')

    See also Curve Fitter, fit, fittype, get, set, setoptions

    Introduced in Curve Fitting Toolbox before R2006a
    Documentation for fitoptions
       doc fitoptions

    Other uses of fitoptions

       fittype/fitoptions

Regardless, I'd just be lazy and use a weighted regression fit.

x = [4 9 15 25 35 45 55 65 75 85 95 150 250 350 450 900];
y = [335 28 37 9 4 3.5 2 1 0.7 0.6 0.5 0.3 0.1 0.01 0.005 0.001];
w = 1./y.^2;
ft = fittype('a*x^n + b','independent','x');

Note my choice of an intelligent set of starting values, based on the previous fits. I did not need to be perfect, just get it into a reasonable ballpark.

[fitresult, gof] = fit(x',y',ft,'Startpoint', [1000 -2 0.001],'weights',w)

fitresult =

General model: fitresult(x) = a*x^n + b Coefficients (with 95% confidence bounds): a = 6370 (341.8, 1.24e+04) b = -0.001035 (-0.003061, 0.0009918) n = -2.209 (-2.434, -1.984)

gof = struct with fields:

sse: 3.9757 rsquare: 0.7227 dfe: 13 adjrsquare: 0.6801 rmse: 0.5530

hold on

plot(x,y,'o-')

plot(x,(fitresult.a .* (x.^fitresult.n) + fitresult.b),'-')

set(gca, 'XScale', 'log', 'YScale', 'log');

Finally, note that the estimated value for b is a negative value, thus -0.001035. fit really wants to make that last value generate a negative prediction.

John D'Errico il 31 Ott 2024 alle 9:23

Apri in MATLAB Online

Bound constraints in optimization can sometimes be a problem. As with all constraints, they must employ a tolerance, such that the constraint can sometimes be exceeded by a tiny amount, the constraint tolerance. And while that amount can seem tiny, it is highly important.

For example, suppose you decide to compute the log(x), and you never wish to see a complex result? Is it acceptable to have any negative value of x?

log(-1e-300)
ans = -6.9078e+02 + 3.1416e+00i

The point is, a function like a log or sqrt can never accept ANY negative value at all as an argument, as you will see a complex result.

But when you set a lower bound of 0 for some parameter x, the optimization tool will allow that constraint to be exceeded by the constraint tolerance. And you cannot set the constraint tolerance to be exactly 0.

As I said, when working with logs, constraints can be problematic. You will need to set the lower bound high enough that the lower bound minus the constraint tolerance does not go below zero.

Sim il 31 Ott 2024 alle 10:31

Very interesting! Yes working with logs can be messy sometimes.. To avoid as much as possible possible troubles, I am following the "path" with the weights that you showed me, thanks a lot :-)

Accedi per commentare.

Fittype and fit do not work as expected (in a curve fitting)

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

8 Commenti
Mostra 6 commenti meno recentiNascondi 6 commenti meno recenti

Più risposte (0)

Vedere anche

Categorie

Tag

Community Treasure Hunt

Fittype and fit do not work as expected (in a curve fitting)

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

8 Commenti Mostra 6 commenti meno recentiNascondi 6 commenti meno recenti

Più risposte (0)

Vedere anche

Categorie

Tag

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

8 Commenti
Mostra 6 commenti meno recentiNascondi 6 commenti meno recenti