How do I fit a regression equation to find coefficients and exponents?
Mostra commenti meno recenti
I'm currently working with a wide dataset, where I'm using a machine learning technique to select key identifiers (SIMPLS partial least squares). Then I want to use the identifiers and my outcome to create a predictive equation. I've tried a bunch of linear regression tools but they only find the predictor's coefficents, where I am trying to find the coefficients and the exponents. To get around this I'm trying to use 'nlinfit' to force the final equation into the desired form. This is where I'm having an issue, when I run the code I get the following error:
"The function you provided as the MODELFUN input has returned Inf or NaN values."
I've also tried inputting the model in the following form:
modelfun = 'y~(b1 + b2*x1.^b3 + b4*x2.^b5 + b6*x3.^b7)';
For reference my current data set is a 13x16 matrix, at it's largest it will be 24x35 matrix where the last column represents the outcome. Once the variable selection is complete (works without issue) the matrix is reduced to an nx4 matrix
Here is my code:
clear
clc
% Imports data and removed first text column
data = readtable('PMHS PLS Practice.xlsx',"textType","string");
data.SpecimenID = [];
% Splits data into independant and dependant
% variables and normalizes values
X = data(:,1:15);
X = X{:,:};
Xnorm = normalize(X);
Y = data(:,16);
Y = table2array(Y);
Ynorm = normalize(Y);
% Performs Simpls PLS on normalized data returning
% the X scores and percent varience per variable
% for the first 5 latent variables
Cpart = cvpartition(13,"LeaveOut");
[~,~,SCR,~,~,PCTVAR,~,~] = plsregress(Xnorm,Ynorm,5,'cv',Cpart);
X_VAR = PCTVAR(1,:);
Y_VAR = PCTVAR(2,:);
% pareto(X_VAR)
% finds the dependant variable with the largest
% contribution to the first 3 latent variables
[Var1,id1] = max(abs(SCR(:,1)));
[Var2,id2] = max(abs(SCR(:,2)));
[Var3,id3] = max(abs(SCR(:,3)));
% creates a matrix containing the selected variables
X_reg = [data{:,id1} data{:,id2} data{:,id3}];
% Fits the data with a non-linear model with
% initial coefficient guesses of beta0
modelfun = @(b,x) (b(1)+b(2)*x(:,1).^b(3)+b(4)*x(:,2).^b(5)+b(6)*x(:,3).^b(7));
beta0 = ones(1,7);
[coeff] = nlinfit(X_reg,Y,modelfun,beta0);
19 Commenti
Torsten
il 16 Nov 2022
We don't know your data.
But it is mandatory that x(:,1), x(:,2) and x(:,3) are vectors with only positive elements and without NaN or +/- Inf values.
Is this the case in your application ?
Liam
il 16 Nov 2022
Torsten
il 16 Nov 2022
Before calling nlinfit, try what you get when you call
Y_fit0 = modelfun(beta0,X_reg)
This is what nlinfit will do first and which results in the error message you get.
Does Y_fit0 contain Inf or NaN values as claimed by the error message ?
Liam
il 16 Nov 2022
It would help if you attached a .mat file containing Y and X_reg.
Also, are the exponent parameters b([3,5,7]) supposed to be non-negative? If so, it might be better to use a different fitting tool, one which allows you to specify lower bounds (e.g., lsqcurvefit or fit).
Are you sure that there are no zero values or values < 0 in the vectors X_reg ?
Are you sure Y contains no NaN or +/- Inf values ?
If you are sure for both questions, then make a function
function y = modelfun(b,x)
y = b(1)+b(2)*x(:,1).^b(3)+b(4)*x(:,2).^b(5)+b(6)*x(:,3).^b(7);
if any(isnan(y)) || any(isinf(y))
disp('Error in modelfun')
end
end
instead of your function handle and check when and why the output vector y has Inf or NaN values in it.
Liam
il 16 Nov 2022
Liam
il 16 Nov 2022
David Goodmanson
il 16 Nov 2022
Hi Liam, so x(:,1) etc. contain no zeros, is that correct?
Liam
il 16 Nov 2022
using the function that you provided will only return the Y_fit0 values because it's unsupervised, ie I'm passing the function the initial guesses for the beta values and the x predictors, thus it will only return the initial properties for Y.
No. If you remove the line
modelfun = @(b,x) (b(1)+b(2)*x(:,1).^b(3)+b(4)*x(:,2).^b(5)+b(6)*x(:,3).^b(7));
in your code and add the function I wrote instead to your code, this function will be called every time by nlinfit if the solver wants to evaluate Yfit. And the error message will be displayed if any of the Yfit values is NaN or +/-Inf.
Maybe it helps further to print the b-vector in this case:
function y = modelfun(b,x)
y = b(1)+b(2)*x(:,1).^b(3)+b(4)*x(:,2).^b(5)+b(6)*x(:,3).^b(7);
if any(isnan(y)) || any(isinf(y))
disp('Error in modelfun')
b
end
end
Liam
il 16 Nov 2022
[coeff] = nlinfit(X_reg,Y,@(b,x)modelfun(b,x),beta0);
instead of
[coeff] = nlinfit(X_reg,Y,modelfun,beta0);
Here are .mat files containing the data for X_reg and Y.
Why two .mat files?
All of the values are positive integers, some relatively small but positive.
I see non-integer values in your attached files,
>> format long; Y(1), X_reg(1)
ans =
4.562586000000000e+03
ans =
0.062228000000000
In fact all X_reg are non-integers less than 1.
Torsten
il 16 Nov 2022
And what coefficients and what Yfit values did you get from "modelfun" at this stage ?
Liam
il 16 Nov 2022
Liam
il 16 Nov 2022
Matt J
il 17 Nov 2022
I can't provide my raw data because it has identifiable donor health information.
But why 2 .mat files instead of 1.
Risposta accettata
Più risposte (1)
Alex Sha
il 17 Nov 2022
Although the results may seem strange, mathematically speaking, the result below is the best one:

Sum Squared Error (SSE): 875229.002284955
Root of Mean Square Error (RMSE): 259.47120816783
Correlation Coef. (R): 0.945217846153423
R-Square: 0.893436776686916
Parameter Best Estimate Std. Deviation Confidence Bounds[95%]
--------- ------------- -------------- --------------------------------
b1 12988.1117515585 6.29998367140824 [12972.6962468509, 13003.5272562661]
b2 1.533510681096E126 0.070612379218076 [1.533510681096E126, 1.533510681096E126]
b3 109.691046465045 0.167406667907439 [109.281417105381, 110.100675824708]
b4 -97.0000950427331 1048.74886499901 [-2663.19612168365, 2469.19593159819]
b5 -3.07105087006786 14192.7795542942 [-34731.5515429606, 34725.4094412205]
b6 -9.2988265393269E18 1.40462270054153E-15 [-9.2988265393269E18, -9.2988265393269E18]
b7 12.1425640359073 3.30305643552503E-124 [12.1425640359073, 12.1425640359073]
6 Commenti
Matt J
il 17 Nov 2022
the result below is the best one
One of the best ones perhaps. I don't think there is a unique best fit.
Alex Sha
il 17 Nov 2022
Should be unique, if ignoring the error caused by the accuracy of numerical computation
Sum Squared Error (SSE): 875229.002279992
Root of Mean Square Error (RMSE): 259.471208167095
Correlation Coef. (R): 0.945217846153723
R-Square: 0.893436776687483
Parameter Best Estimate
--------- -------------
b1 12988.1075276193
b2 1.53583923988692E126
b3 109.691633072097
b4 -96.9998016037511
b5 -3.07105262671452
b6 -9.2988221706396E18
b7 12.142564075687
Should be unique, if ignoring the error caused by the accuracy of numerical computation
I don't know why you think so. Even if we assume that your exponents b3,b5,b7 are correct and true, the condition number on the remaining unknowns is terrible:
load X_reg;
b3=109.691633072097;
b5=-3.07105262671452;
b7=12.142564075687;
A=X_reg.^[b3,b5,b7]; A(:,end+1)=1;
cond(A)
Alex Sha
il 20 Nov 2022
Hi, just discussion!
If there are various solutions as you said, what are the objective function values (SSE) Corresponding to those solutions? equal to or even less than the result I shown above, that is: SSE=875229.002279992, if your results are bad than this, it menas the results you get are all local solutions, not global one.
There is only one global optimization solution, like I provied previously.
Checking one result you given:
b1 = 1.0624e+04
b2 = 1.7367e+06
b4 = -116.6191
b6 = -3.0221e+15
b3 = 2.6098
b5 = -2.8463
b7 = 9.4736
The SSE value is: 1432079.63835075, it is much higher (poor) than 875229.002279992.
The reason why you get different solution each times computation is maybe the optimization algorithms you uesd are local optimization algorithms (LOA), not global optimization algorithms (GOA), LOA has no ability, and does not guarantee a global optimal solution.
Anyway, if you can get SSE value near to 875229.002279992, you will find that all parameter values will be fixed.
Matt J
il 20 Nov 2022
If there are various solutions as you said, what are the objective function values (SSE)
Probably very similar to what you got. My solution may be local and your solution may be global, but that does not mean the global solution is unique.
Alex Sha
il 20 Nov 2022
If possible, would you please be kind to show me one more global solution other than mine? so I can make some comparisons and find out why.
Categorie
Scopri di più su Mathematics and Optimization in Centro assistenza e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!