KS TEST fails at 0.05 but passes at 0.01

Hello, I need some help in verifying the KSTEST method, which i implemented in the code. I have attached the data file and compared the theoretical CDF with ECDF using beta distribution. the KSTEST fails at 95% of significance while it passes at 99% (tight range). Can any one please check whether my method of implementing KS test is correct or wrong? and if my way interpretation is wrong then?
R = load('R.txt')
figure
h1 = histogram(R(1:end),20,'Normalization','cdf');
[f,x] = ecdf(R);
pd_12 = betacdf(x,a_mle,b_mle); % theoretical
[h2,p,ksstat] = kstest(R,'CDF', makedist('Beta','a',a_mle,'b',b_mle),'Alpha',0.01)
J = plot(x,pd_12,'b','Linewidth',2); grid on;
hold on
plot(x,f,'LineStyle', '-', 'Color', 'r','Linewidth',2)
legend('Histogram of data','Theoretical Beta CDF','ECDF of data','Location','best')

2 Commenti

We need a_mle and b_mle to test your code. Is it the coefficients from
betafit(R)
?
a_mle = 1.6941, b_mle = 4.1671, determined from betafit(R)

Accedi per commentare.

 Risposta accettata

R = load('R.txt');
coeff = betafit(R);
a_mle = coeff(1);
b_mle = coeff(2);
[~,x] = ecdf(R);
pd_12 = betacdf(x,a_mle,b_mle); % theoretical
[h_05,p_05,ksstat_05] = kstest(R,'CDF', makedist('Beta','a',a_mle,'b',b_mle),'Alpha',0.05)
h_05 = logical
1
p_05 = 0.0142
ksstat_05 = 0.0262
[h_01,p_01,ksstat_01] = kstest(R,'CDF', makedist('Beta','a',a_mle,'b',b_mle),'Alpha',0.01)
h_01 = logical
0
p_01 = 0.0142
ksstat_01 = 0.0262
It is unclear to me what you mean by "pass" and "fail" the test, or by "tight range".
The P-value of the K-S test is equal to 0.0142. Therefore,
  • if alpha is 0.05, the null hypothesis is rejected (h=1)
  • if alpha is 0.01, the null hypothesis is not rejected (h=0)
So, I guess you were interpreting the output incorrectly?

7 Commenti

I donot understand the p value...
if alpha = 0.05, this means 95% data (theoretical and ECDF) are compared with each other, while alpha = 0.01, means 99% data is compared with each other. please correct me if i am wrong here...
Muhammad Abdullah
Muhammad Abdullah il 16 Ott 2024
Modificato: Muhammad Abdullah il 16 Ott 2024
I looked it up, the lower p value suggests in both cases, that data doesn't follow beta distribution... visually, the data looks to fit the beta distribution in both PDF and CDF plots, I have uploaded the snapshot, can you please see a bit
can we modify the following command
[h_01,p_01,ksstat_01] = kstest(R,'CDF', makedist('Beta','a',a_mle,'b',b_mle),'Alpha',0.01)
i have to exclude some elements from the upper and lower tails of both theoretical CDF and ECDF, for example, the comaprison should be done from 5th element to 2649.
alpha = 0.05 does not mean that 95% of the data are considered.
The null hypothesis is that the data (R) were drawn from the beta distribution with a_mle=1.6941, b_mle=4.1671. Suppose the null hypothesis is true. The P value is the probability that you would see a K-S statistic that is as larger (or larger) than what you observed with your data. Your K-S stat is 0.0262. The probability of a random draw from the beta distribution giving a K-S stat that large is 0.0142.
Should you reject the null hypothesis, based on the data you observed? Well, you should have decided on the value of alpha before observing the data. If you had decided on alpha=0.05, then you would reject the null hypothesis, because P-value < alpha.
Regarding your visual assessment ...
When you have a large amount of data, even relatively small deviations from the theoretical distribution can become statistically significant, even if they are not meaningfully different.
thanks very much for your explanation...can you help me to modify the command, i want to perform the KSTEST on a specific array elements...
[h_01,p_01,ksstat_01] = kstest(R,'CDF', makedist('Beta','a',a_mle,'b',b_mle),'Alpha',0.01)
It seems strange to want to apply kstest to only specific array elements. Which ones?
What are you trying to achieve? What is the purpose of using this statistical test?
I am using beta, weibull, lognormal distribution, KSTEST results show that it is beta distribution, but wanted to see whether at some range of elements, the null hypothesis be true for all three distributions.
It's a strange way to think about it. If you exclude elements that would have been expected from random draws from those distributions, then it makes the null distributions less likely to be accepted.
That being said, if you want to limit R to only those values smaller than R_threshold, then
R_limited = R(R < R_threshold)
is that set of values.

Accedi per commentare.

Più risposte (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by