How can I reduce error on loglog scale using linear regression?

This is what I am doing with my imported data. What can I do to reduce the multiplicative errors? I tried to adapt non-linear regression to my script but I don't understand the examples that I've found so far. If possible, please suggest what could I do in both scenarios.
  1. Reduce the error in linear regression
  2. Apply non-linear regression instead with every line explained
The plot of my data in log scale is shown below.
Thanks
% x: assume any column vector x
% y: assume any column vector y
loglog(x,y, '*');
% % Estimating the best-fit line
const = polyfit(log(x),log(y), 1);
m = const(1);
k = const(2);
bfit = x.^m.*exp(k); % y = x^m * exp(k)
hold on
loglog(x,bfit)
Screen Shot 2018-11-13 at 00.33.57.png

 Risposta accettata

Change them to additive errors (as they should be) using nonlinear regerssion techniques.
Example (from another Answer)
temp = [100,200,400,600,800,1000,1200,1400,1600]';
density = [3.5,1.7,0.85,0.6,0.45,0.35,0.3,0.25,0.2]';
fcn = @(b,x) exp(b(1).*x).*exp(b(2)) + b(3);
[B,rsdnorm] = fminsearch(@(b) norm(density - fcn(b,temp)), [-0.01; max(density); min(density)]);
fprintf(1, 'Slope \t\t=%10.5f\nIntercept \t=%10.5f\nOffset \t\t=%10.5f\n', B)
tv = linspace(min(temp), max(temp));
figure
plot(temp, density, 'p')
hold on
plot(tv, fcn(B,tv), '-')
grid
text(500, 1.7, sprintf('f(x) = %.2f\\cdote^{%.4f\\cdotx} + %.2f', B([2 1 3])))
There are a number of funcitons you can use to do nonlinear parameter estimation in MATLAB. I use fminsearch here because every body has it.

7 Commenti

Thank you @Star Strider. Would you mind telling me where fcn comes from? I don't get that bit...
Here are my actual x and y. When I do what you did I simply get a flat horizontal line that does not fit the data at all...
>> x = [250000000;350000000;400000000;450000000;250000000;350000000;450000000];
>> y =[7.43184000000000e-05;0.000253574900000000;0.000327284100000000;0.000337806300000000;0.000759150650000000;0.000962776550000000;0.00127277395000000]
Screen Shot 2018-11-13 at 12.11.49.png
My pleasure.
The ‘fcn’ objective function is the expression of the function derived from this semi-log relation.
Your loglog function is slightly different:
log(y) = m*log(x) + b
taking antilogs of both sides, transforms to:
y = x^m * exp(b)
You are getting a horizontal line because of the magnitude of your ‘x’ values with respect to your ‘y’ values.
I had to include a ‘fudge factor’ to scale your ‘x’ values, since they are so large in comparison ot your ‘y’ values. I leave you to scale the parameter values and make any further adjustments:
x = [250000000;350000000;400000000;450000000;250000000;350000000;450000000];
y =[7.43184000000000e-05;0.000253574900000000;0.000327284100000000;0.000337806300000000;0.000759150650000000;0.000962776550000000;0.00127277395000000];
fcn = @(b,x) (x/b(4)).^b(1) .* exp(b(2)) + b(3);
[B,rsdnorm] = fminsearch(@(b) norm(y - fcn(b,x)), [3E-4; 0.04; -1; 1E+8])
tv = linspace(min(x), max(x));
figure
plot(x, y, 'p')
hold on
plot(tv, fcn(B,tv), '-')
grid
text(2.7E+8, 0.0011, sprintf('f(x) = (x/%.1E)^{%.4f\\cdotx} \\cdot %.2f %+.2f', B([4 1 2 3])))
The parameters this code estimates are:
B =
0.00048594
0.047982
-1.0495
6.0403e+07
Sorry, still don't get it.
If my loglog function leads to (I get this)
y = x^m * exp(b)
Where did the one below came from?
@(b,x) (x/b(4)).^b(1) .* exp(b(2)) + b(3);
And what do each b(n) represent?
That is essentially the same expression, with two ‘tweaks’. The first is that ‘b(4)’ scales your ‘x’ coordinate (the other option would be to subtract ‘b(4)’ to centre your ‘x’ values), and the second is a y-offset ‘b(3)’ that is essentially the same as an ‘intercept’ term. The ‘m’ parameter is ‘b(1)’, and the ‘b’ parameter is ‘b(2)’.
Subtracting ‘b(4)’ instead of dividing by it, the ‘fun’ function becomes:
fcn = @(b,x) (x-b(4)).^b(1) .* exp(b(2)) + b(3);
and the fitted parameters are:
B =
0.000734759132744285
0.0898781101211069
-1.10975948838436
-191465367.406798
The parameter vector is necessary because of the way the MATLAB nonlinear parameter estimation and other optimisation routines work. They require that parameters be expressed as elements of the same vector.
3E-4; 0.04; -1; 1E+8
How can you tell that these are the most accurate guesses? Even with the slightest change the results differ a lot...
What's the criteria to estimate these?
I'm struggling to find explanatory content regarding this bit...
The fminsearch function is much more sensitive to initial parameter estimates than other optimisation routines. I decided to let the Global Optimization Toolbox genetic algorithm ga funciton see what it could come up with, using patternsearch to fine-tune the parameter estimates.
The best were:
B =
3.29616368688383 -75.2881776260513 0.0003711181640625 -256076512.169127
producing a residual norm of 0.0010247, and:
How can I reduce error on loglog scale using linear regression - 2018 11 13.png
This is simply the nature of many nonlinear parameter estimation problems. Your problem is particularly difficult because of the range and magnitude of your data, and the small number of data you have.

Accedi per commentare.

Più risposte (0)

Prodotti

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by