Generation of Conditional Random Variables
22 views (last 30 days)
I am updating my question as it seems before it was a bit confusing
I hope this is clear now (in case it is not clear, please ask and I will clarify):
Assume I have 16 random variables X1,..,X16 and 16 constant numbers C1,..,C16.
I want to generate samples of (X1+...+X16), considering that X1,..,X16 are random variables (1 to 8 are normally distributed and 9 to 16 are weibull) and the generated samples must always respect the following constraints:
X1/(X1+ ... +X16)=C1
X16/(X1 + ... + X16)=C16
where C1,...C16 are constant values and are known prior to the generation of the RVs.
and C1+...+C16 is always 1 and always positive numbers).
Thanks a lot for any suggestions
Bruno Luong on 9 Jan 2021
Edited: Bruno Luong on 9 Jan 2021
I don't quite understand what you want but from your request you seem to compute
X1 = C1 * S
X2 = C2 * S
X16 = C16 * S
With S being the sum of X_i. Therefore you just need to generate S (since all C are known).
If you know the pdf of X_i and the in principle you would be able to compute the theoretical pdf of S.
For example if X_i are independent and normal distribution with the same standard deviation of sigma, then S is normal distribution with standard deviation of sigma*sqrt(16).
Then all you need is generate randomly S from its pdf, then apply the above scalling then your are done.
If you are not able to compute the theoretical pdf of sum of mixed normal and weibull (admidly not trivial), the sum converges toward the normal distribution with
meanS := mean(S) = sum(mean(X_i))
stdS := std(S) = sqrt(sum(std(X_i)^2))
IMO if you generate S as
S = meanS + stdS*randn()
X = C * S
C being the array of 16 known values, it would well fit your need.
More Answers (2)
Jeff Miller on 3 Jan 2021
Edited: Jeff Miller on 3 Jan 2021
It sounds like the only "free" aspect of your new samples is the new total (say, 241 instead of 240). Once you have that new total, your 16 individual random scores are in the pre-determined proportions of that new total.
So, generate new random totals in the same way as the original--i.e., as the sum of 16 separate random scores--and then compute the 16 individual new random variables as the pre-determined proportions of the new total.
Note that the 16 random variables generated in this way will no longer be normals and weibulls, however, since they are fixed proportions of a (total) random variable that is neither normal nor weibull.
By the way, you can speed up the generation of the total of 8 independent normal random variables, since their total is a single normal with the sum of the means and the sum of the variances.
Paul on 8 Jan 2021
Edited: Paul on 13 Jan 2021
Based on this comment, the problem is as follows:
Let Xi be a set of independent, continuous random variables, i = 1-n.
Let wi be a set of weights, i = 2-n. The goal is find samples of Xi under the constraints Xi/sum(Xi) = wi, i = 2-n.
To solve this problem:
1. transform to a different set of random varibles defined as follows: Y1 = sum(Xi), Yi = Xi/sum(Xi) i = 2-n
2. find the joint pdf of Yi
3. Find the conditional pdf of Y1 given Yi = wi, i = 2-n.
4. sample Y1 from its conditional pdf
5. generate samples of Xi from Y1 and the constraint equations.
For example assume:
X1 ~ N(0,1)
X2 ~ N(1,2)
X3 ~ W(1,1)
X4 ~ W(2,1)
% Step 0
% define the distribution parameters
mu1 = 0; sigma1 = 1;
mu2 = 1; sigma2 = 2;
a3 = 1; b3 = 1;
a4 = 1; b4 = 2;
% define the pdfs of the random variables
syms x1 x2 real
syms x3 x4 real
f_x1(x1) = 1/sigma1/sqrt(2*pi)*exp(-(x1-mu1)^2/(2*sigma1^2)); % -inf < x1 < inf
f_x2(x2) = 1/sigma2/sqrt(2*pi)*exp(-(x2-mu2)^2/(2*sigma2^2)); % -inf < x2 < inf
f_x3(x3) = piecewise(x3 <= 0, 0, x3 > 0, (b3/a3)*((x3/a3)^(b3-1))*exp(-((x3/a3)^b3))); % -inf < x3 < inf
f_x4(x4) = piecewise(x4 <= 0, 0, x4 > 0, (b4/a4)*((x4/a4)^(b4-1))*exp(-((x4/a4)^b4))); % -inf < x4 < inf
% define the joint pdf of Xi
f_x1x2x3x4(x1,x2,x3,x4) = f_x1(x1)*f_x2(x2)*f_x3(x3)*f_x4(x4);
% Step 1
% define the transformation of RVs
syms y1 y2 y3 y4 real
g1 = y1 == x1 + x2 + x3 + x4;
g2 = y2 == x2/(x1 + x2 + x3 + x4);
g3 = y3 == x3/(x1 + x2 + x3 + x4);
g4 = y4 == x4/(x1 + x2 + x3 + x4);
% Step 2
% solve for the inverse equations
h = solve([g1,g2,g3,g4],x1,x2,x3,x4,'Real',true,'ReturnConditions',true);
% find the Jacobian of the inverse solution
J(y1,y2,y3,y4) = jacobian([h.x1, h.x2, h.x3, h.x4],[y1, y2, y3, y4]);
% determinant of J
detJ(y1,y2,y3,y4) = det(J(y1,y2,y3,y4));
% joint density of Yi (edit 12 Jan 2021, math should be abs(detJ). Did not change results.
f_y1y2y3y4(y1,y2,y3,y4) = f_x1x2x3x4(h.x1,h.x2,h.x3,h.x4)*abs(detJ(y1,y2,y3,y4));
% Step 3
% The conditional density of Y1 given Yi = yi is the joint density divided by the marginal density
% marginal density of Yi, i = 2-4
f_y2y3y4(y2,y3,y4) = int(f_y1y2y3y4(y1,y2,y3,y4),y1,-inf,inf);
% conditional density of Y1
f_y1giveny2y3y4(y1,y2,y3,y4) = f_y1y2y3y4(y1,y2,y3,y4)/f_y2y3y4(y2,y3,y4);
% verify conditional density using Jeff Miller's acceptance approach (https://www.mathworks.com/matlabcentral/answers/707693-generation-of-conditional-random-variables#comment_1242578)
simN = 5000000;
X1 = random('Normal',mu1,sigma1,[simN 1]);
X2 = random('Normal',mu2,sigma2,[simN 1]);
X3 = random('Weibull',a3,b3,[simN 1]);
X4 = random('Weibull',a4,b4,[simN 1]);
w2 = 0.3;
w3 = 0.5;
w4 = 0.7;
Y1 = X1 + X2 + X3 + X4;
Y2 = X2./(X1 + X2 + X3 + X4);
Y3 = X3./(X1 + X2 + X3 + X4);
Y4 = X4./(X1 + X2 + X3 + X4);
tolerance = 0.1;
accepted = abs(Y2 - w2) < tolerance & abs(Y3 - w3) < tolerance & abs(Y4 - w4) < tolerance;
title('f(y1 | y2,y3,y4), Analytic and Histogram (Acceptance)')
% Step 4
% sample Y1 using inverse sampling (numerical solution)
syms z real
f(z) = vpa(f_y1giveny2y3y4(z,w2,w3,w4));
Cdf_y1(y1) = int(f(z),-inf,y1);
y1vec = 0:.001:5; % Cdf_y1(0) = 0, Cdf_y1(5) = 0.999998034
Cvec = double(Cdf_y1(y1vec));
usamp = rand(simN,1);
Y1samp = interp1(Cvec,y1vec,usamp);
Y2samp = w2;
Y3samp = w3;
Y4samp = w4;
title('f(y1 | y2,y3,y4), Analytic and Histogram (Direct Inverse Sampling)')
% Step 5
% generate samples of Xi from Yi and the constraint equations
h1 = matlabFunction(h.x1);
X1samp = h1(Y1samp,Y2samp,Y3samp,Y4samp);
X2samp = w2*Y1samp;
X3samp = w3*Y1samp;
X4samp = w4*Y1samp;
title('Conditional PDF of X1');
Here are the results. Seemed to work pretty well for this example. Note that Step 5 runs very fast once Steps 1-4 complete.
I don't know how well this approach scales as the number of RVs increases. There may be other gotcha's as well.