How to use chi2gof within CUPID

3 visualizzazioni (ultimi 30 giorni)
Sim
Sim il 22 Giu 2023
Commentato: Sim il 26 Giu 2023
[The same question on the CUPID GitHub]
Two examples of usage of the Matlab's "Chi-square goodness-of-fit test" (chi2gof) function are the following:
First (comparing two frequency distributions):
Population = [996, 749, 370, 53, 9, 3, 1, 0];
Sample = [647, 486, 100, 22, 0, 0, 0, 0];
Population2 = [996, 749, 370, sum(Population(4:8))];
Sample2 = [647, 486, 100, sum(Sample(4:8))];
x = [];
for i = 1:length(Sample2)
x = [x,i*ones(1,Sample2(i))];
end
edges = .5+(0:length(Sample2));
[h,p,k] = chi2gof(x,'Expected',Population2,'Edges',edges)
Second (fit a distribution to data):
bins = 0:5;
obsCounts = [6 16 10 12 4 2];
n = sum(obsCounts);
pd = fitdist(bins','Poisson','Frequency',obsCounts');
expCounts = n * pdf(pd,bins);
[h,p,st] = chi2gof(bins,'Ctrs',bins,...
'Frequency',obsCounts, ...
'Expected',expCounts,...
'NParams',1)
But, how can I use the chi2gof function within CUPID?
Here below an example where I would like to use the Matlab's chi2gof function :
addpath('.../Cupid-master')
% (1) create a "truncated dataset"
pd = makedist('Weibull','a',3,'b',5);
t = truncate(pd,3,inf);
data_trunc = random(t,10000,1);
% (2) fit a distribution (in this case the "Weibull2") to the "truncated test"
fittedDist = TruncatedXlow(Weibull2(2,2),3);
% (3) estimate the Weibull parameters by maximum likelihood, allowing for the truncation.
fittedDist.EstML(data_trunc);
% (4) plot both the "truncated test" (through the histogram) and the "fitting distribution"
% (in this case the "Weibull2" with Weibull's parameters estimated by maximum likelihood)
figure
xgrid = linspace(0,100,1000)';
histogram(data_trunc,100,'Normalization','pdf','facecolor','blue')
line(xgrid,fittedDist.PDF(xgrid),'Linewidth',2,'color','red')
xlim([2.5 6])

Risposta accettata

Jeff Miller
Jeff Miller il 23 Giu 2023
Yes, that is correct. The successive bin probabilities are the differences of the successive CDF values, and the expected number is the total N times the bin probability--just as you have computed it.
  2 Commenti
Sim
Sim il 23 Giu 2023
Thanks a lot @Jeff Miller, very kind!! :-)
Sim
Sim il 26 Giu 2023
To future readers
I accepted the @Jeff Miller's answer
"Yes, that is correct. The successive bin probabilities are the differences of the successive CDF values, and the expected number is the total N times the bin probability--just as you have computed it."
since it confirms what I showed in my Answer (please see my two examples called "Test 1" and "Test 2"):
"I might have found a solution that makes sense to me and gives me what I would expect, even though I am not 100% sure it is correct... maybe, experts of CUPID and chi2gof might tell me if this is correct.... Test 1.... Test 2....."

Accedi per commentare.

Più risposte (1)

Sim
Sim il 22 Giu 2023
Modificato: Sim il 22 Giu 2023
I might have found a solution that makes sense to me and gives me what I would expect, even though I am not 100% sure it is correct... maybe, experts of CUPID and chi2gof might tell me if this is correct:
Test 1: I produce an artifical set of data following a distribution (A) and I fit those data with the same distribution (A)
% (1) create a "truncated dataset"
pd = makedist('Exponential','mu',1); % <-- dataset following a distribution (A)
whereToTruncate = 2;
t = truncate(pd,whereToTruncate,inf);
data_trunc = random(t,10000,1);
% (2) fit a distribution to the "truncated test"
fittedDist = TruncatedXlow(Exponential(1),whereToTruncate); % <-- fitting distribution (A)
% (3) estimate the distribution parameters by maximum likelihood, allowing for the truncation.
fittedDist.EstML(data_trunc);
% (4) plot both the "truncated test" (through the histogram) and the "fitting distribution"
figure
xgrid = linspace(0,10,1000)';
num_bins = 50;
hold on
histogram(data_trunc,num_bins,'Normalization','pdf','facecolor','blue')
line(xgrid,fittedDist.PDF(xgrid),'Linewidth',2,'color','red')
hold off
xlim([0 7])
% (5) calculate the Chi-square goodness-of-fit test (chi2gof)
bin_edges = linspace(min(data_trunc), max(data_trunc), num_bins+1);
expected_values = numel(data_trunc) * diff(fittedDist.CDF(bin_edges));
[h,p,st] = chi2gof(data_trunc, 'Expected', expected_values)
% Output Test 1
h =
0
p =
0.55248
st =
struct with fields:
chi2stat: 21.469
df: 23
edges: [2.0001 2.2661 2.5321 2.7982 3.0642 3.3302 3.5963 3.8623 4.1283 4.3944 4.6604 4.9264 5.1925 5.4585 5.7245 5.9906 ]
O: [2368 1798 1344 1107 810 594 442 333 294 212 165 116 113 68 53 37 33 28 15 15 18 11 5 21]
E: [2348.7 1797.1 1375 1052 804.95 615.89 471.24 360.56 275.87 211.08 161.5 123.57 94.548 72.341 55.351 42.35 32.404 ]
Test 2: I produce an artifical set of data following a distribution (A) and I fit those data with a different distribution (B)
% (1) create a "truncated dataset"
pd = makedist('Exponential','mu',1); % <-- dataset following a distribution (A)
whereToTruncate = 2;
t = truncate(pd,whereToTruncate,inf);
data_trunc = random(t,10000,1);
% (2) fit a distribution to the "truncated test"
fittedDist = TruncatedXlow(Normal(0,1),whereToTruncate); % <-- fitting distribution (B)
% (3) estimate the distribution parameters by maximum likelihood, allowing for the truncation.
fittedDist.EstML(data_trunc);
% (4) plot both the "truncated test" (through the histogram) and the "fitting distribution"
figure
xgrid = linspace(0,10,1000)';
num_bins = 50;
hold on
histogram(data_trunc,num_bins,'Normalization','pdf','facecolor','blue')
line(xgrid,fittedDist.PDF(xgrid),'Linewidth',2,'color','red')
hold off
xlim([0 7])
% (5) calculate the Chi-square goodness-of-fit test (chi2gof)
bin_edges = linspace(min(data_trunc), max(data_trunc), num_bins+1);
expected_values = numel(data_trunc) * diff(fittedDist.CDF(bin_edges));
[h,p,st] = chi2gof(data_trunc, 'Expected', expected_values)
% Output Test 2
h =
1
p =
6.4417e-116
st =
struct with fields:
chi2stat: 628.59
df: 26
edges: [2.0001 2.1895 2.3789 2.5682 2.7576 2.947 3.1364 3.3258 3.5152 3.7046 3.8939 4.0833 4.2727 4.4621 4.6515 4.8409 ]
O: [1742 1409 1198 959 798 699 561 463 391 295 266 205 162 135 114 102 86 73 56 51 39 30 22 18 16 20 90]
E: [1386.2 1248.4 1114.2 985.49 863.77 750.27 645.8 550.88 465.67 390.1 323.84 266.42 217.2 175.48 140.5 111.47 87.65 ]

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by