The proper way to sample 3 normally or lognormal distributed variables added up to 1

Question

Jesse Chao il 2 Gen 2023

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/1887397-the-proper-way-to-sample-3-normally-or-lognormal-distributed-variables-added-up-to-1

Commentato: Paul il 5 Gen 2023

Hello team,

Is there any way to generate 3 normally or lognormal distributed variables added up to 1?

For example, I have a human tissue with a volume as 1 L. And there are three components vascular space (R_vas), interstitial space (R_int), cellular space (R_cell) to composite this tissue. Thus, in this case, R_vas + R_int + R_cell = 1.

To further consider variation in different human indivisual, I would like to normally or lognormally sample these three parameters (R_vas, R_int and R_cell have a mean 0.1, 0.1, 0.8, and coefficient variation 0.3, 0.3, 0.3, respectively) but ensure that these three parameters add up to 1.

I don't think I could just randomly sample two of the parameters and use 1 to minus the sum of the two randomly smapled parameters to get the third one. In this case, I think the third parameter is not followed the predefined mean and coefficient variation. or is it?

How could I achieve this task?

It will be great to have your help.

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

John D'Errico il 2 Gen 2023

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1887397-the-proper-way-to-sample-3-normally-or-lognormal-distributed-variables-added-up-to-1#answer_1139822

Modificato: John D'Errico il 2 Gen 2023

Apri in MATLAB Online

Let me answer your second question separately. How would you sample three NORMALLY distributed random variables that sum to 1? This part is easy, since it merely requires you to properly construct the necessary covariance matrix.

First, assume we have three variables, with means and standard deviations:

mu = [.1 .1 .8];
sigma = [.3 .3 .3];

But really, we need a covariance matrix that is not a diagonal one. I'll construct it as if I knew the eigenvalue decomposition.

Q = [null([1 1 1]),[1;1;1]/sqrt(3)]
Q = 3×3
   -0.5774   -0.5774    0.5774
    0.7887   -0.2113    0.5774
   -0.2113    0.7887    0.5774
C = Q*diag(3/2*[.3^2 .3^2 0])*Q'
C = 3×3
    0.0900   -0.0450   -0.0450
   -0.0450    0.0900   -0.0450
   -0.0450   -0.0450    0.0900

As you can see, the VARIANCES (so the sqrt of the diagonal elements) of the covariance matrix are each 0.3. But they are now correlated.

Now we can generate set of numbers that have the desired property, at least over the long term, and within floating point trash. What I did was to carefully construct a singular covariance matrix.

X = mvnrnd(mu,C,10000);

hist(sum(X,2))

mean(X)

ans = 1×3

0.1037 0.1011 0.7952

std(X)

ans = 1×3

0.2987 0.3059 0.2997

format long g

[min(sum(X,2)),max(sum(X,2))]

ans = 1×2

0.9999999675926 1.00000003250349

So to within floating point trash, they sum to 1. At least as close as we can come based on double precision arithmetic in what I did. They have the desired means and variances.

Note that these variables have a problem perhaps in what you are doing, in that some of those elements will often be negative.

hist(min(X,[],2))

That is a likely event when the sum is required to be 1, and you are asking about NORMALly distributed random variables. This is also a reason why your goal MUST fail for the lognormal case.

8 Commenti
Mostra 6 commenti meno recentiNascondi 6 commenti meno recenti

Paul il 5 Gen 2023

Apri in MATLAB Online

I would attack the problem this way

N = [eye(2);[-1 -1]];

N forms a basis for the null space of [1 1 1]

[1 1 1]*N
ans = 1×2
     0     0

Define C as follows

syms sigma_x sigma_y rho_xy

C = N*[sigma_x^2 rho_xy*sigma_x*sigma_y;rho_xy*sigma_x*sigma_y sigma_y^2]*N.'

C =

We see that we can define the covariance of X and Y (upper 2x2 block), but we get what we get for the variance of Z and its correlation with the other two variables.

Example from @John D'Errico formed by

C = N*[.3^2 -0.5*0.3*.3;-0.5*.3*.3 .3^2]*N'
C = 3×3
    0.0900   -0.0450   -0.0450
   -0.0450    0.0900   -0.0450
   -0.0450   -0.0450    0.0900

But we can pick other parameters. For example

C = N*[0.3^2 0.7*0.3*1.3;0.7*0.3*1.3 1.3^2]*N'

C = 3×3

0.0900 0.2730 -0.3630 0.2730 1.6900 -1.9630 -0.3630 -1.9630 2.3260

mu = [0.2 0.3 0.5]; % must sum to 1

histogram(sum(mvnrnd(mu,C,1e5),2));

Geometrically, the ellipsoid is collapsing down to an ellipse centered on mu and lying in the plane x+y+z=1. There's probably even a further degenerate case where the ellipsoid collapses to a line going through my and lying in that plane.

Bruno Luong il 5 Gen 2023

The problem is - as John pointed out - x, y, z ca get negative values. If that negative quantities has physical interpretation for fractional volume of human tissu then it's OK.

Paul il 5 Gen 2023

As John did, I'm just addressing the question of how to specify mu and C such that mvnrnd(mu,C,n) returns values that sum to unity. Maybe that's not the correct question to ask for what the OP is really interested in.

Accedi per commentare.

Answer 2

Torsten il 2 Gen 2023

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1887397-the-proper-way-to-sample-3-normally-or-lognormal-distributed-variables-added-up-to-1#answer_1139802

You can define a common probability distribution of the three variables on the triangle

x + y + z = 1, x, y, z >= 0

but this cannot be a normal or lognormal distribution for each of the random variables as you suggested.

You can also divide the result of individual sampling by R_vas + R_int + R_cell, but this will change the assumed distributions for R_vas, R_int and R_cell.

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Jesse Chao il 5 Gen 2023

Hello Torsten,

Thank you for the suggestions and reminders.

Jesse

Accedi per commentare.

Answer 3

John D'Errico il 2 Gen 2023

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1887397-the-proper-way-to-sample-3-normally-or-lognormal-distributed-variables-added-up-to-1#answer_1139807

Modificato: John D'Errico il 2 Gen 2023

Magic?

You have three variables, with means that will at least get you in the right ballpark. The goal however, its to insure the sum is exactly 1. What property does a lognormal variable have? You can generate one by generating a normally distributed random variable, and then exponentiating it. So effectively, the log of a lognormal variate, is Normally distributed.

Now, if your goal was to find three lognormally distributed variables, where the PRODUCT was 1, this problem would be far easier. You find a set of three normal variables where the sum was zero. When you exponentiate, the product is automatically 1. And finding a set of normal variates with a sum of zero is almost trivial. (No reason to get into that here, as it is not pertinent to your problem.)

Anyway, you are correct in that you CANNOT just sample two variates, then choose the third to force that sum to 1. But let me back up. Does this entire question even make mathematical sense? Sadly, no. Remember that a lognormal variable lives on the interval (0,inf). So there is a strong chance that any one of those variables themselves are greater than 1. And when that happens, the sum can NEVER be 1 because the other two terms in the sum can never be negative. So you cannot have a situation where the sum is constrained to be a constant. Sorry.

At best, you could think about a set of TRUNCATED lognormal variables, that sum to 1. And even then, the truncation point would be difficult to quantify.

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Answer 4

Bruno Luong il 2 Gen 2023

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1887397-the-proper-way-to-sample-3-normally-or-lognormal-distributed-variables-added-up-to-1#answer_1139837

Modificato: Bruno Luong il 2 Gen 2023

Apri in MATLAB Online

This will generate 3 random variables positives and sum to 1, with mean 8/10, 1/10, 1/10:

nu=[8 1 1];
B=arrayfun(@(n) ones(1,n),nu,'unif', 0);
A=blkdiag(B{:});
m=size(A,2);
p=3; % larger p will give smaller std
% this two lines will generate proper uniform conditioninf distributions,
% row-sum to 1 for all p columns
r=-log(rand(m,p)); 
r = sum(r./sum(r,1),2)/p;
%
r=A*r;
r
r = 3×1
    0.7922
    0.1397
    0.0680

It will NOT be normal, but it kind of "normal" in the sense of central limit theorem, i.e. a limit of sum of a "large" independent random variables.

2 Commenti
Mostra NessunoNascondi Nessuno

Bruno Luong il 3 Gen 2023

Modificato: Bruno Luong il 3 Gen 2023

Apri in MATLAB Online

Here is the histogram of the results using my code. It will not Gaussian curve but skewed bell shape:

When you increase p, the histogram will be narrower but less skewed.

nu=[8 1 1];
B=arrayfun(@(n) ones(1,n),nu,'unif', 0);
A=blkdiag(B{:});
m=size(A,2);
p=3; % larger p will give smaller std
rtab = zeros(size(A,1),1e6);
for k=1:size(rtab,2)
    % this two lines will generate proper uniform conditioninf distributions,
    % row-sum to 1 for all p columns
    r=-log(rand(m,p));
    r = sum(r./sum(r,1),2)/p;
    %
    rtab(:,k)=A*r;
end
figure
subplot(2,2,1)
histogram(rtab(1,:))
subplot(2,2,3)
histogram(rtab(2,:))
subplot(2,2,4)
histogram(rtab(3,:))

Run with p = 20

Jesse Chao il 5 Gen 2023

Hello Bruno,

Thank you for the solutions.

This somehow could solve the problem.

Jesse

Accedi per commentare.

The proper way to sample 3 normally or lognormal distributed variables added up to 1

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

8 Commenti
Mostra 6 commenti meno recentiNascondi 6 commenti meno recenti

Più risposte (3)

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

2 Commenti
Mostra NessunoNascondi Nessuno

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

The proper way to sample 3 normally or lognormal distributed variables added up to 1

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

8 Commenti Mostra 6 commenti meno recentiNascondi 6 commenti meno recenti

Più risposte (3)

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

2 Commenti Mostra NessunoNascondi Nessuno

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

8 Commenti
Mostra 6 commenti meno recentiNascondi 6 commenti meno recenti

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

2 Commenti
Mostra NessunoNascondi Nessuno