Normality test of the Matlab function 'randn'
Mostra commenti meno recenti
Hi Matlab community,
I have generated 2000 samples using Matlab function 'randn', and was trying to test it for normality. The h-value return 0 which is expected, however, the p-value is too high! What is going wrong here? Please see the below simple example:
n = randn(1,2000);
[h,p] = lillietest(n)
h =
0
p =
0.4115 %%% Why it is too high?
[h,p] = kstest(n)
h =
logical
0
p =
0.8158 % Same here!
Risposta accettata
Più risposte (2)
I always use the Wilks-Shapiro test(*) and have little experience with Lillifors test, but some trials with various N have convinced me the MATLAB PRNG for normal isn't much good...especially in the tails. Without more digging than had time for right now, it's not clear which generator it is using by default to get a normal; it says it uses the Mersene Twister for the default uniform stream; it doesn't easily relate to the actual normal generation algorithm that I found quickly, anyways.
I increased the sample size to 10K and it's obvious even from just a histogram the distribution is skewed in the tails...

One observes a lot of white under the theoretical on the left, not nearly so much on the right...for this particular instantiation.
Looking at the probability plot it's a lot more obvious, but the tails caught my eye just from the observed histogram w/o the pdf overlaid on it. As John d' Errico noted just a week or so ago in reply to a question raised about similar observation of unusual result from a chi-square test, "when you have a lot of data, it's far easier to reject the hypothesis" because then the data have to be really, really normal. These aren't quite enough to actually reject the hypothesis, but they're not all that close -- and, in fact, the MATLAB implementation just caps the return p value at 0.5 in this case.

This really shows up the shortness of observation in the LH tails region and, to a lesser extent a slight overage in the right. But, the deviation really begins to show up at about the 0.5% tails regions, not just in the extreme tails as the bulk of the asterisks by that point are above/below the line.
(*) A paper on power of various tests for normality I was made aware of via my past reactor background and interaction with NRC is
And, my link back to the NIST discussion and the other respondents Q? is https://www.mathworks.com/matlabcentral/answers/850710-confusino-regarding-statistical-tests-for-given-distribution?s_tid=srchtitle
3 Commenti
Is 10000 points enough to see the tails? I tried a bigger number and it looks better, at least to my eye:
rng(100);
data = randn(1e7,1);
x = -4:.001:4; p = normpdf(x,0,1);
figure;
histogram(data,'Normalization','pdf');
hold on
plot(x,p,'r');
axis([-4 -3 0 0.002])
figure;
histogram(data,'Normalization','pdf');
hold on
plot(x,p,'r');
axis([3 4 0 0.002])
figure
normplot(data)
MahdiH
il 23 Giu 2021
Peter Perkins
il 24 Giu 2021
Madhi, as I said in my reply below, your expectations of what value the p-value "should have" seem misguided. Expect to see p-values uniformly dist'd between 0 and 1 when drawing from the null hypothesis.
MahdiH
il 22 Giu 2021
3 Commenti
dpb
il 22 Giu 2021
I'm not sure without more digging what would be better options -- I did find a blog by Cleve that talks about the MATLAB algorithm implementing George Marsaglia's ziggurat generator, but it doesn't address just how reliable it really is in practice.
I've really been out of the simulation racket for 20 years so have lost track of what is recent.
I am surprised to see as much deviance in these estimates as are; seems like used to get better plots years ago from the IMSL libraries that used on the mainframes back when that was something was doing routinely. But, it's been a long, long time now since I did MC simulation "in anger"...
dpb
il 23 Giu 2021
Mayhaps John d' Errico will stumble across this thread -- he's a far better mathematician than I and while retired is quite a bit more recently active than I.
He probably would have some salient input...
Peter Perkins
il 24 Giu 2021
Depending on when way back actually was, IMSL on mainframes likely used a multiplicative congruential generator; I don't think we want to go back to those days.
Categorie
Scopri di più su Random Number Generation in Centro assistenza e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!




