Generate random values from a pdf within a given interval
Mostra commenti meno recenti
Hi,
I wish to generate random values from a given probability density distribution that's only within a certain interval. Is there a smart way to do this?
Chris
Risposta accettata
Più risposte (2)
John D'Errico
il 5 Set 2016
Modificato: John D'Errico
il 5 Set 2016
It does very much depend on the distribution. And, yes, one could write a book on the subject. Luckily, people already have done so. It very much depends on the existence of an inverse CDF function for your distribution, or an approximation for one. It also helps if a transformation exists between distributions.
There are several valid approaches, and it very much depends on the distribution.
1. acceptance-rejection. Here you generate samples, rejecting those that fall outside the limits. Keep those that are inside. This is great if the probability is high that one will lie inside the acceptable limits. But if the rejection fraction is too large, then you may never see an acceptable result.
2. Transformations. Some distributions are transformable to something simple to work with. Lots of the classic distributions can be transformed exactly into either a beta distribution, a gamma distribution, or a normal distribution. Each of these have direct methods to work with the inverse of the CDF. Sometimes the transformation is only approximate, like that between the Poisson and a normal. It is fine for some Poisson parameters, but not most of them, and not the ones that are interesting.
3. If the inverse CDF is available, or a transformation thereto, then use it!
Again, you can find entire books written on these subjects.
Some of the classic texts (said early in the am, without looking at my bookshelf because it is too dark and I am sleepy) are Johnson & Kotz (Balakrishnan was a co-author on later editions of this multi-volume tome), and Abramowitz & Stegun.
Suppose your goal was to work with a Normal distribution. Even that can be done using either of the main approaches above.
Suppose we wish to generate normal deviates from a standard unit normal, so mean 0, variance 1, but those that lie within the interval [a,b]. As an example, I'll use the interval [1,2]. Acceptance-rejection would work fine here, but lets not.
ab = [1 2];
ab_cdf = normcdf(ab,0,1)
ab_cdf =
0.84134 0.97725
Here, I have the stats toolbox, and it has normcdf. Yes, were I without that useful tool, I could have done the same with a transformation of erf or erfc. So knowing ones special functions is crucial. That is why you need a copy of Abramowitz and Stegun on your shelf.
The above numbers can be viewed as probabilities.
diff(ab_cdf)
ans =
0.13591
By the way, there is a 13.6% rate that rejection will succeed here, so you will need to oversample by about 7-1, rejecting roughly 7 samples for every one that lies inside the interval of interest.
So instead, we now generate UNIFORM deviates, that live within the interval bounded by ab_cdf.
N = 1000;
u = rand(1,N)*diff(ab_cdf) + ab_cdf(1);
x = norminv(u,0,1);
[min(x),max(x)]
ans =
1.0001 1.9961
So it appears as if we have succeeded. In fact, we have. No rejection was needed at all. You could use hist to convince yourself they are Normally distributed, in the interval of interest. Easy to do using normcdf and norminv, but almost as easy using erfc and erfcinv, were I lacking the stats TB. All that is needed is to know the transformation from the special function erfc into the normal CDF.
So understanding how to work with the special functions erf/erfc, and the incomplete beta and gamma functions are all crucial.
Suppose you gave me a distribution where ONLY the PDF was available? Personally, I might decide to compute a very accurate approximation to the CDF using numerical integration, so the CDF, at a discrete set of points. Then I might approximate the CDF using a spline in the form of pchip. (Pchip is the choice because it is guaranteed to be monotonic.) But it is rather easy to form the inverse of a pchip interpolant, so I would now have an effective inverse CDF too.
Any of the above methods will fail SOMETIMES. For example, compute samples from a unit Normal that lie in the interval [1000 1010].
Acceptance-rejection will take somewhat longer than the lifetime of the universe to generate one sample that lies in that interval. And the inverse CDF method will fail due to floating point problems. My solution here would be to recognize there are simple schemes for approximating the tails of the normal CDF way out there. These approximations are even pretty accurate. (Abramowitz & Stegun will be your go-to text here.)
2 Commenti
Chris Egeland Busuttil
il 5 Set 2016
John D'Errico
il 5 Set 2016
There are lots of tricks one can learn here. A quick scan through the appropriate chapter of Johnson & Kotz (& Balakrishnan) is well worth your time. An expensive book as I recall, but I used it a fair amount when I was writing my stats toolbox, and you can surely find it in a university library. My copy of Abramowitz & Stegun was so heavily used that I put tabs on the pages for the probability distributions, beta & gamma functions, and orthogonal polynomials.
KSSV
il 5 Set 2016
Random numbers within a specific range can be generated using:
N = 1000 ;
a = 50;
b = 100;
r = (b-a).*rand(N,1) + a;
1 Commento
Walter Roberson
il 5 Set 2016
That only works for uniform random numbers.
Categorie
Scopri di più su Random Number Generation in Centro assistenza e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!