how to perform ols regression using combinations of independent variable?
5 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Hi!
I have been struggling for a while with the following problem.
Suppose we have y as a dependent variable and x1,...,xn as exogenous variables (n>7).
What I want to do is try to see which combination of exogenous variables gives best fit for y ...
So, if we have, for example, 3 exogenous variables, I would like to see which of the following regressions is best for fitting y (assuming that I know what statistic I will be using to discriminate between a "good" model from a "bad one"):
y~x1 ;
y~x2 ;
y~x3 ;
y~x1+x2 ;
y~x1+x3 ;
y~x2+x3 ;
y~x1+x2+x3
For only 3 variables, it is not that complicated (2^3-1 possibilities). The problem appears when I begin introducing more and more exogenous variables (2^7-1 = 127). How can I do it (somehow automatically) for all combinations when number of exogenous is big (>7)?
Thanks for your help!
Cheers!
0 Commenti
Risposte (3)
Image Analyst
il 29 Nov 2014
Why not just use all of them and let the regression figure out how to weight the different xn?
y = alpha0 + alpha1 * x1 + alpha2 * x2 + alpha3 * x3
You can't use polyfit() but you can use the standard least squares formula
alpha = inv(x' * x) * x' * y; % Get estimate of the alphas.
Where x = an N rows by 4 columns matrix.
1, x1(1), x2(1), x3(1)
1, x1(2), x2(2), x3(2)
1, x1(3), x2(3), x3(3)
1, x1(4), x2(4), x3(4)
...
1, x1(N), x2(N), x3(N)
If one of the xn is not a good predictor, it should have a small alpha weight.
1 Commento
Matt J
il 29 Nov 2014
Modificato: Matt J
il 29 Nov 2014
You can't use polyfit() but you can use the standard least squares formula
No, don't do that. Just do
alpha=x\y;
for better conditioning. However, I assume that the OP's case is really more complicated, and that the x matrix does not have full column rank.
Star Strider
il 29 Nov 2014
You are describing a stepwise multiple linear regression. It is a well-known, established technique, and the statistical procedure for adding and removing variables to get the best fit is not trivial.
If you have the Statistics Toolbox, see the documentation for Stepwise Regression and specifically stepwiselm, stepwise, and stepwisefit.
With 127 variables, and especially if you have a large data set, it is going to take some time. Have something else to do for a few minutes while the regression runs.
0 Commenti
Matt J
il 29 Nov 2014
Modificato: Matt J
il 29 Nov 2014
As ImageAnalyst says, performing an OLS regression with the entire data set should give you the unique best regression in one step, unless your x1,...,xn are over-complete.
If they are over-complete, and you are looking for the sparsest solution, the Matching Pursuit algorithm seems to be the standard alternative to an exhaustive search. There are several implementations on the File Exchange, but I've never used any of them:
Also, the solution is not guaranteed to be globally sparsest - the price paid for not doing an exhaustive search, it seems.
0 Commenti
Vedere anche
Categorie
Scopri di più su Linear and Nonlinear Regression in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!