How to take the standard deviation into account to get correct confidence bounds when fitting data using the Curve Fitting Toolbox?

6 visualizzazioni (ultimi 30 giorni)
Hello,
I have data, that looks like this:
If you want fit the data using the Curve Fitting Toolbox. In the following code you fit the depicted data using the fit function from the Curve Fitting Toolbox. The fit has the following confidence bound:
fit1 =
Linear model Poly1:
fit1(x) = p1*x + p2
Coefficients (with 95% confidence bounds):
p1 = 2.117 (1.888, 2.346)
p2 = 0.8444 (0.3498, 1.339)
In case you don't have every single data point, but the mean and standard deviation for the x-values, you get the same coefficients, but the confidence bounds are different, because the standard deviation is not taken into account. fit2 from the following code has this confidence bounds:
fit2 =
Linear model Poly1:
fit2(x) = p1*x + p2
Coefficients (with 95% confidence bounds):
p1 = 2.117 (1.261, 2.973)
p2 = 0.8444 (-1.004, 2.693)
How can I get correct confidence bounds, if i only have the mean and standard deviation of a dataset?
This is the code I used for the examples:
yData = [3.3; 2.8; 2.9; 4.7; 5.1; 5.2; 7.4; 7.0; 7.3];
xData = [1; 1; 1; 2; 2; 2; 3; 3; 3];
fit1 = fit(xData, yData, 'poly1')
yData_mean = [mean(yData(1:3)); mean(yData(4:6)); mean(yData(7:9))]; % computing the mean
yData_std = [std(yData(1:3), 0); std(yData(4:6), 0); std(yData(7:9), 0)];
xData_new = [1; 2; 3];
fit2 = fit(xData_new, yData_mean, 'poly1')

Risposta accettata

John D'Errico
John D'Errico il 1 Set 2018
Using fit? You cannot do so.
You might try using weights, where the weight would be something like the inverse of the standard deviation. That should effectively account for the difference in spread at the three levels. But weights are just relative things. If you doubled all the weights, you would get exactly the same result. So it would NOT give you the same confidence intervals. Weights are not be the correct solution here.
In order to get the same effective confidence intervals, you might look to use lscov instead, which allows you to provide a prior covariance matrix on the data. As a guess, you might need to use the alternative estimate of the standard deviation, thus with a 1 there. Not sure about that, at least not without some thought.
yData_std = [std(yData(1:3), 1); std(yData(4:6), 1); std(yData(7:9), 1)];
[coef,coefstd] = lscov([xData_new,ones(3,1)],yData_mean,diag(yData_std.^2))
coef =
2.12457627118644
0.833898305084746
coefstd =
0.0621231336196812
0.144417985825505
Next, be careful, because the degrees of freedom will be screwed up. So this might make sense:
tinv(.975,3)
ans =
3.18244630528371
Which provides at least similar width confidence intervals.
fit2 = fit(xData, yData, 'poly1')
fit2 =
Linear model Poly1:
fit2(x) = p1*x + p2
Coefficients (with 95% confidence bounds):
p1 = 2.117 (1.888, 2.346)
p2 = 0.8444 (0.3498, 1.339)
coef + coefstd*tinv(.975,3)
ans =
2.32227980824704
1.29350079049164
coef - coefstd*tinv(.975,3)
ans =
1.92687273412584
0.374295819677852
I've been sloppy, and probably got something wrong in all this, but it should be close. I might not expect to get exactly the same estimates and confidence intervals.
I have a funny feeling that in order to do better yet, you may need to provide a true covariance matrix to lscov, rather than a diagonal one containing only computed variances.

Più risposte (0)

Categorie

Scopri di più su Get Started with Curve Fitting Toolbox in Help Center e File Exchange

Prodotti


Release

R2017a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by