Help creating an array with uniformly distributed random numbers (row-wise) comprised between 0 and 1, with each column having a sum of 1

Question

Simone A. il 14 Lug 2023

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/1996093-help-creating-an-array-with-uniformly-distributed-random-numbers-row-wise-comprised-between-0-and

Commentato: Paul il 15 Lug 2023

Risposta accettata: Torsten

Apri in MATLAB Online

Hi All,

I am trying to solve this problem, but I am not even sure this is possible.

I need to run a model 1e8+ times. To do this - making sure the model is unbiased - I need to obtain 4 random values comprised between 0 and 1 at each run, and their sum must equal 1 (these values will be area fractions).

So I have tried the following:

Random_Fraction_a = rand(1, 4); % Random Numbers
Random_Fraction_b = Random_Fraction_a ./ sum(Random_Fraction_a); % Random Numbers wich their sum = 1

However, as depicted in the figure below, 'rand' returns uniformly distributed random numbers, which is exactly what I am after, but when I normalise them between [0 1], the distribution changes completely. Now, it is my understanding that this is "statistically normal", but it is not what I need. Do you know, or can you think of any alternative way to obtain 4 random decimal values comprised between 0 and 1, wich their sum gives 1, and that are uniformly distributed row-wise?

Any help is grately appreciated!

See figure (and code) below:

% Example code
for i=1:1e5
    Random_Fraction_a = rand(1, 4);
    Random_Fraction_b = Random_Fraction_a ./ sum(Random_Fraction_a);
    Rand_Fa(:,i)=Random_Fraction_a;
    Rand_Fb(:,i)=Random_Fraction_b;
end
figure
for ii=1:4
    subplot(2,2,ii)
    histogram(Rand_Fa(ii,:))
    hold on
    histogram(Rand_Fb(ii,:),"FaceAlpha",0.5)
    legend('Rand Output','After Normalisation')
end

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Torsten il 14 Lug 2023

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1996093-help-creating-an-array-with-uniformly-distributed-random-numbers-row-wise-comprised-between-0-and#answer_1273233

Modificato: Torsten il 14 Lug 2023

Apri in MATLAB Online

https://uk.mathworks.com/matlabcentral/fileexchange/9700-random-vectors-with-fixed-sum

You cannot expect the usual uniform distribution for the 4 rows of the matrix. This is impossible since each vector component poses conditions on the others in the corresponding column.

What you can expect is that columns of the matrix are uniformly distributed over the part of the hyperplane x1+x2+x3+x4 = 1 with xi >= 0.

I don't know if this serves your needs.

x = randfixedsum(4,1e5,1,0,1);

for ii=1:4

subplot(2,2,ii)

histogram(x(ii,:),"FaceAlpha",0.5)

legend('After Normalisation')

end

function [x,v] = randfixedsum(n,m,s,a,b)

% [x,v] = randfixedsum(n,m,s,a,b)

%

% This generates an n by m array x, each of whose m columns

% contains n random values lying in the interval [a,b], but

% subject to the condition that their sum be equal to s. The

% scalar value s must accordingly satisfy n*a <= s <= n*b. The

% distribution of values is uniform in the sense that it has the

% conditional probability distribution of a uniform distribution

% over the whole n-cube, given that the sum of the x's is s.

%

% The scalar v, if requested, returns with the total

% n-1 dimensional volume (content) of the subset satisfying

% this condition. Consequently if v, considered as a function

% of s and divided by sqrt(n), is integrated with respect to s

% from s = a to s = b, the result would necessarily be the

% n-dimensional volume of the whole cube, namely (b-a)^n.

%

% This algorithm does no "rejecting" on the sets of x's it

% obtains. It is designed to generate only those that satisfy all

% the above conditions and to do so with a uniform distribution.

% It accomplishes this by decomposing the space of all possible x

% sets (columns) into n-1 dimensional simplexes. (Line segments,

% triangles, and tetrahedra, are one-, two-, and three-dimensional

% examples of simplexes, respectively.) It makes use of three

% different sets of 'rand' variables, one to locate values

% uniformly within each type of simplex, another to randomly

% select representatives of each different type of simplex in

% proportion to their volume, and a third to perform random

% permutations to provide an even distribution of simplex choices

% among like types. For example, with n equal to 3 and s set at,

% say, 40% of the way from a towards b, there will be 2 different

% types of simplex, in this case triangles, each with its own

% area, and 6 different versions of each from permutations, for

% a total of 12 triangles, and these all fit together to form a

% particular planar non-regular hexagon in 3 dimensions, with v

% returned set equal to the hexagon's area.

%

% Roger Stafford - Jan. 19, 2006

% Check the arguments.

if (m~=round(m))|(n~=round(n))|(m<0)|(n<1)

error('n must be a whole number and m a non-negative integer.')

elseif (s<n*a)|(s>n*b)|(a>=b)

error('Inequalities n*a <= s <= n*b and a < b must hold.')

end

% Rescale to a unit cube: 0 <= x(i) <= 1

s = (s-n*a)/(b-a);

% Construct the transition probability table, t.

% t(i,j) will be utilized only in the region where j <= i + 1.

k = max(min(floor(s),n-1),0); % Must have 0 <= k <= n-1

s = max(min(s,k+1),k); % Must have k <= s <= k+1

s1 = s - [k:-1:k-n+1]; % s1 & s2 will never be negative

s2 = [k+n:-1:k+1] - s;

w = zeros(n,n+1); w(1,2) = realmax; % Scale for full 'double' range

t = zeros(n-1,n);

tiny = 2^(-1074); % The smallest positive matlab 'double' no.

for i = 2:n

tmp1 = w(i-1,2:i+1).*s1(1:i)/i;

tmp2 = w(i-1,1:i).*s2(n-i+1:n)/i;

w(i,2:i+1) = tmp1 + tmp2;

tmp3 = w(i,2:i+1) + tiny; % In case tmp1 & tmp2 are both 0,

tmp4 = (s2(n-i+1:n) > s1(1:i)); % then t is 0 on left & 1 on right

t(i-1,1:i) = (tmp2./tmp3).*tmp4 + (1-tmp1./tmp3).*(~tmp4);

end

% Derive the polytope volume v from the appropriate

% element in the bottom row of w.

v = n^(3/2)*(w(n,k+2)/realmax)*(b-a)^(n-1);

% Now compute the matrix x.

x = zeros(n,m);

if m == 0, return, end % If m is zero, quit with x = []

rt = rand(n-1,m); % For random selection of simplex type

rs = rand(n-1,m); % For random location within a simplex

s = repmat(s,1,m);

j = repmat(k+1,1,m); % For indexing in the t table

sm = zeros(1,m); pr = ones(1,m); % Start with sum zero & product 1

for i = n-1:-1:1 % Work backwards in the t table

e = (rt(n-i,:)<=t(i,j)); % Use rt to choose a transition

sx = rs(n-i,:).^(1/i); % Use rs to compute next simplex coord.

sm = sm + (1-sx).*pr.*s/(i+1); % Update sum

pr = sx.*pr; % Update product

x(n-i,:) = sm + pr.*e; % Calculate x using simplex coords.

s = s - e; j = j - e; % Transition adjustment

end

x(n,:) = sm + pr.*s; % Compute the last x

% Randomly permute the order in the columns of x and rescale.

rp = rand(n,m); % Use rp to carry out a matrix 'randperm'

[ig,p] = sort(rp); % The values placed in ig are ignored

x = (b-a)*x(p+repmat([0:n:n*(m-1)],n,1))+a; % Permute & rescale x

end

6 Commenti
Mostra 4 commenti meno recentiNascondi 4 commenti meno recenti

Simone A. il 15 Lug 2023

Thanks for your time and comprehensive explanation @Torsten. Really appreciated!

Paul il 15 Lug 2023

Apri in MATLAB Online

This implemenation looks a little better if allowing some tolerance on the comparison to one. Or maybe use sym and then convert to double after elimination? And of course the V variables are in a nice ndgrid pattern, so an additional random step would be needed to select a column of x, or the columns would have to be scrambled before selecting sequentially. In the plots below I added the theoretical pdf on the histograms.

v1 = 0:0.01:1;

v2 = 0:0.01:1;

v3 = 0:0.01:1;

v4 = 0:0.01:1;

n = numel(v1);

[V1,V2,V3,V4] = ndgrid(v1,v2,v3,v4);

idx = find(abs(V1+V2+V3+V4-1) <= 1e-12);

V1 = V1(idx);

V2 = V2(idx);

V3 = V3(idx);

V4 = V4(idx);

x = [V1,V2,V3,V4].';

fz = @(z) 3*(z-1).^2; % theoretical pdf

for ii=1:4

subplot(2,2,ii)

histogram(x(ii,:),"FaceAlpha",0.5,'Normalization','pdf')

hold on

plot(0:.01:1,fz(0:.01:1),'LineWidth',2)

legend('After Normalisation')

end

Accedi per commentare.

Answer 2

John D'Errico il 14 Lug 2023

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1996093-help-creating-an-array-with-uniformly-distributed-random-numbers-row-wise-comprised-between-0-and#answer_1273483

Apri in MATLAB Online

randfixedsum.m

This is a mistake I see made so frequently, that is, a misunderstanding of what it means for a sample to be uniformly distributed, but with a sum constraint. The two ideas are sort of at odds, since a sample cannot be fully uniformly distributed the way we expect, once that constraint enters into the problem.

One thought is to start in 2-dimensions. So we have x1 and x2, uniformly distributed. but we want the sum to be 1. An answer is simple, we just choose x1 to be uniform on the interval [0,1], and then compute x2=1-x1. So this is easy. We can think of the result as choosing a sample uniformly along the straight line: x1+x2=1.

x1 = rand(1,500);

x2 = 1 - x1;

plot(x1,x2,'o')

You can create the desired array now, as

X = [x1;x2];

So the columns of X are uniformly distributed, under a sum constraint. Sadly, things get messier if we want to live in 3-dimensions. We cannot simply choose x1 and x2 independently now, because some of the time, x1+x2 will be greater than 1.

The randfixedsum code solves the problem elegantly, by sampling correctly so the set ois uniform, but it will lie in a TRIANGLE. Try it.

X = randfixedsum(3,1000,1,0,1);

plot3(X(1,:),X(2,:),X(3,:),'o')

box on

grid on

axis equal

view(48,28)

Now the points are seen to be uniformly distributed inside a TRIANGLE. However, you cannot now look at the marginal distrbution of any single variable, and hope them to also look uniform. This is your mistake. In fact, the histogram will now look like a trianglular distribution, with most of the samples near zero.

histogram(X(1,:),50)

Now, try the same thing in for 10 rows. I'll use a larger sample this time to make the histogram look pretty.

X = randfixedsum(10,100000,1,0,1);

size(X)

ans = 1×2

10 100000

histogram(X(1,:),50,'normalization','pdf')

Again, just because the sample is uniform in one respect, does NOT mean the marginal distributiuon should also be uniform. If we could plot the sample with a 10 dimensionsal monitor (sorry, I don't have one) then the plot would agsin be seen to be uniformly sampled inside a 10-dimensional simplex (an analogue of a triangle, but in 10-dimensions.)

You CANNOT have everything your intuition demands. Sorry, but too often a simple intuition runs counter to mathematical fact.

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Simone A. il 15 Lug 2023

Hi @John D'Errico, thanks a lot for taking the time to explain the reationale behind it, that made everything much easier and clearer. Once again, thanks a lot!

Accedi per commentare.

Help creating an array with uniformly distributed random numbers (row-wise) comprised between 0 and 1, with each column having a sum of 1

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

6 Commenti
Mostra 4 commenti meno recentiNascondi 4 commenti meno recenti

Più risposte (1)

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

Help creating an array with uniformly distributed random numbers (row-wise) comprised between 0 and 1, with each column having a sum of 1

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

6 Commenti Mostra 4 commenti meno recentiNascondi 4 commenti meno recenti

Più risposte (1)

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

6 Commenti
Mostra 4 commenti meno recentiNascondi 4 commenti meno recenti

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti