Linear regression on training set

J. Alex Lee il 10 Set 2020

you already identified that your regression can be made into linear form, so that's already a big hint for you...

katara il 10 Set 2020

Yeah so, I tried rewriting the function as log(y)=log(y0) + rt and then using polyfit(t, log(y),1) but since y0 is unknown that doesn't work.

katara il 10 Set 2020

Modificato: katara il 10 Set 2020

Apri in MATLAB Online

I just realized I could just name a new variable y = log(y) and use polyfit from there. So my code is:

A=[130, 300, 400, 500, 650, 1075, 2222, 2550, 3300]';
t = [1930, 1943, 1966, 1976, 1991, 1994, 2000, 2005, 2008];
t1=[1930, 1943, 1966, 1976, 1991];
idx=randperm(numel(A));
subSet1=A(idx(1:5)); %Trainingset
subSet2=A(idx(6:end)); %Validationset
y=log(subSet1);
c=polyfit(t1,y, 1)
r=c(1);
lny0=c(2);
y0=exp(c(2));
y2 = y0*exp(r*t);
plot(t,y2,'*')

But now I have chosen that t1 is the first five years of t, which won't correspond correctly to the randomly chosen values of the training set. Is there a way of choosing five t values that will correspond to the randomly chosen values?

Johannes Hougaard il 10 Set 2020

Apri in MATLAB Online

the five t values that will correspond to the randomly chosen values are used by using the idx vector similarly to what you do for A.

A=[130, 300, 400, 500, 650, 1075, 2222, 2550, 3300]';
t = [1930, 1943, 1966, 1976, 1991, 1994, 2000, 2005, 2008];
idx=randperm(numel(A));
subSet1=A(idx(1:5)); %Trainingset
subSet2=A(idx(6:end)); %Validationset
t1 = t(idx(1:5)); %t values for Trainingset
y=log(subSet1);
c=polyfit(t1,y, 1)
r=c(1);
lny0=c(2);
y0=exp(c(2));
y2 = y0*exp(r*t);
plot(t,y2,'*')

And to apply your polyfit result you could just use polyval.

% Or you could use
y2 = exp(polyval(c,t));
plot(t,y2);

Adam Danz il 10 Set 2020

Modificato: Adam Danz il 10 Set 2020

Apri in MATLAB Online

Johannes has the right approach (maybe it can be written as an answer). It can be generalized to any size dataset using

idx = randperm(numel(A)); 
nTrain = ceil(numel(A)/2); 
% nTest = numel(A)-nTrain; % if needed
trainIdx = 1:nTrain; 
testIdx = nTrain+1 : numel(A); 
trainSet = [A(trainIdx); t(trainIdx)]; % assuming A and t are row vectors
testSet = [A(testIdx); t(testIdx)];    % same assumptionx
% Then proceed with fitting on the trainSet and measuring 
% error on the testSet

Also note that if you're planning on using a more rigorous cross validation, use cvpartition to partition your data.

katara il 10 Set 2020

Apri in MATLAB Online

Thank you!

One question to Johannes, how can I plot the polyfit using polyval. In other problems I have used for example:

c=polyfit(t, temp, 2)
x=polyval(c,t)
plot(t,temp,'*', t, x)

However, for this problem I tried:

y=log(subSet1);
c=polyfit(t1,y, 1)
p=polyval(c,t);
r=c(1);
lny0=(c(2));
y0=exp(c(2));
y2 = y0*exp(r*t);
plot(t,y2,'*',t,p)

And it didn't work. The code You wrote with polyval didn't work either.

The whole code is now:

A=[130, 300, 400, 500, 650, 1075, 2222, 2550, 3300]';
t = [1930, 1943, 1966, 1976, 1991, 1994, 2000, 2005, 2008];
idx=randperm(numel(A));
subSet1=A(idx(1:5)); %Trainingset
subSet2=A(idx(6:end)); %Validationset
t1=t(idx(1:5)); %t values for Trainingset
y=log(subSet1);
c=polyfit(t1,y, 1)
p=polyval(c,t);
r=c(1);
lny0=(c(2));
y0=exp(c(2));
y2 = y0*exp(r*t);
plot(t,y2,'*',t,p)

J. Alex Lee il 10 Set 2020

Apri in MATLAB Online

you just need to exponentiate the result of polyval (remember you took the log), and I would wager the plot you really want is

plot(t,A,'*',t,exp(polyval(c,t)))

Or if I may:

A=[130, 300, 400, 500, 650, 1075, 2222, 2550, 3300];
t = [1930, 1943, 1966, 1976, 1991, 1994, 2000, 2005, 2008];
idx=randperm(numel(A));
subSet1=A(idx(1:5)); %Trainingset
subSet2=A(idx(6:end)); %Validationset
t1=t(idx(1:5)); %t values for Trainingset
t2=t(idx(6:end)); %t values for Trainingset
y=log(subSet1);
c=polyfit(t1,y, 1)
p=polyval(c,t);
r=c(1);
y0=exp(c(2));
yMdlFn = @(t)(y0*exp(r*t));
% to evaluate on test set
yMdlTest = yMdlFn(t2)
% more comprehensive plot
figure(1); cla; hold on
plot(t1,subSet1,'*')
plot(t2,subSet2,'o')
fplot(yMdlFn,[1929,2009])

But also recommend implement Adam's generalization to arbitrarily large data sets partitioned into arbitrarily sized training and test sets (although i think the code posted doesn't work)

Image Analyst il 10 Set 2020

If you want a log fit, use fitnlm() rather than polyfit().

J. Alex Lee il 10 Set 2020

i would take linear least squares anywhere i can get it, including this situation. linear fitting doesn't require initial guesses and guaranteed to give a "result", and is faster. now you could use the result of the polyfit to do a nonlinear fit, if you want to define the least squares differently. But you're still left with a choice on how to define your residual anyway, so you have a lot more things to worrry about if you care to that level with nonlinear fitting.

Linear regression on training set

9 Commenti
Mostra 7 commenti meno recenti Nascondi 7 commenti meno recenti

Risposte (1)

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Categorie

Tag

Community Treasure Hunt

Linear regression on training set

9 Commenti Mostra 7 commenti meno recenti Nascondi 7 commenti meno recenti

Risposte (1)

0 Commenti Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Categorie

Tag

Vedere anche

Community Treasure Hunt

9 Commenti
Mostra 7 commenti meno recenti Nascondi 7 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti