What is the difference between different ways to do least square

34 visualizzazioni (ultimi 30 giorni)
Here I encounter this problem of using different ways to do least square. And I got different results (some are quite different). I want to know why. Basically, I tried to use different ways to compute ||Aθ-y||min. So I used these three methods.
theta_train_5k = ((A_train_5k'*A_train_5k)^-1)*A_train_5k'*y_train_5k;
% This is the result of least square
theta_train_5k_3 = A_train_5k\y_train_5k;
% This is also the result of least square
theta_train_5k_2 = lsqr(A_train_5k,y_train_5k);
% This is result of least square using lsqr
And I found different results.
theta_train_100 = ((A_train_100'*A_train_100)^-1)*A_train_100'*y_train_100;
theta_train_100_3 = A_train_100\y_train_100;
% This is also the result of least square for 100 data points
theta_train_100_2 = lsqr(A_train_100,y_train_100);
% This is result of least square using lsqr
For the above one, the result is even more strange. with theta_train_100 1000 to 100000 times larger than theta_train_3 and theta_train_2. So I was wondering when should I use which? Does it have something to do with the condition number or the singular value of the matrix?
Please help. Thank you in advance.
Variables are in the attachment

Risposta accettata

Matt J
Matt J il 13 Ott 2025 alle 2:54
Modificato: Matt J il 13 Ott 2025 alle 14:56
The train_100 system is underdetermined, so of course you aren't going to get a unique solution.
For the 5k data, the only reason you see a significant disagreement with lsqr is because you ran lsqr with too few iterations and too loose a tolerance. You can see below that adjusting this reduces the disagreement. In any case, mldivide() is considered the efficient and stable method for small, nonsparse systems (which yours is), so there is no reason to be using lsqr.
load myvariable
theta_train_5k = ((A_train_5k'*A_train_5k)^-1)*A_train_5k'*y_train_5k;
% This is the result of least square
theta_train_5k_3 = A_train_5k\y_train_5k;
% This is also the result of least square
theta_train_5k_2 = lsqr(A_train_5k,y_train_5k,1e-8,300);
lsqr converged at iteration 183 to a solution with relative residual 0.36.
pdiff=@(a,b) norm(a-b)/norm(a)*100; % percent disagreement function
pdiff(theta_train_5k_3, theta_train_5k )
ans = 1.1997e-11
pdiff(theta_train_5k_3, theta_train_5k_2 )
ans = 7.7810e-04
  3 Commenti
Zeyuan
Zeyuan il 13 Ott 2025 alle 15:23
Also, I found out that if we add 1e-8,300 to the code, it will kinda overfit, so that testing accuarcy will go down by 0.2%
Matt J
Matt J il 13 Ott 2025 alle 15:31
Modificato: Matt J il 13 Ott 2025 alle 15:47
I am a bit confused. If we do not get a unique solution for train_100 data, how can we still get results in theta_train_100, theta_train_100_2,theta_train_100_3?
Least squares solutions still exist even when non-unique (there will be infinitely many). But you cannot expect different methods to give you the same one..
Also, I found out that if we add 1e-8,300 to the code, it will kinda overfit, so that testing accuarcy will go down by 0.2%
That doesn't mean the least squares solver made a mistake. The equations you provided were still correctly solved, as we can see from the 3-way agreement between all the solver results.

Accedi per commentare.

Più risposte (0)

Categorie

Scopri di più su Descriptive Statistics in Help Center e File Exchange

Prodotti


Release

R2025b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by