Generating dispersed (non-integer) random matrix/array that sums to a particular value

1 visualizzazione (ultimi 30 giorni)
One of the most suggested (in fact the only one to my finding) for generating random numbers (<1) that will sum to 1 is Random Vectors with Fixed Sum by Roger Stafford. However, what I noticed is that the data generated is not well dispersed. e.g.,
P = randfixedsum(10,10000,1,0.05,0.9); % a 10-by-100000 matrix where each column of P sums to 1 and each elements is between 0.05 and 0.9
find(any(P>0.5))
ans =
1×0 empty double row vector
So far, every single time I tried it results in an empty vector - it always limits itself within below 0.5. Is there a way I could generate more dispersed data where it would include values between 0.05 and 0.9 (for the above example)?
Thanks in advance for your kind help.
FYI: I have tried this (took help from one of the MATLAB answers)
function P = rand_fixed_sum_2(p,n) % p number of columns, and n number of rows and each column sums to 1
for j = 1:p
n1=10^(n-1);
m=1:n1;
a=m(sort(randperm(n1,n)));
b=diff(a);
b(end+1)=n1-sum(b);
P(:,j) = (b/sum(b))';
end
end
But obviously the value of n1 is not feasible for higher dimensions (n>5). However, for lower dimensions, by tweaking n1, I could get much more dispersed data.

Risposta accettata

John D'Errico
John D'Errico il 28 Giu 2020
Modificato: John D'Errico il 28 Giu 2020
I think you do not understand what you are asking.
randfixedssum indeed produces results that are uniformly sistributed within the sub-set in question. That is, any point in a 10 dimensional space that satisfies the requirements of a fixed sum is equally likely to arise.
However, that does not mean that it is at all probable you would find something that satisfies your goal, of "dispersion".
For example, suppose you were to choose one element that is greater than 0.5? Then the probability that the other 9 elements were ALL small enough that the sum is 1, is pretty low. In the 9 dimensional space that remains, that event would be actually very uncommon.
Thus, you want to generate 10 numbers, all of which lie between 0.05 and 0.9, such that the sum is 1.
Suppose, just suppose that one of the numbers was say, 0.6? Now what are the odds that you can find 9 other numbers that make the total sum exactly 1, but none of them are less than 0.05? SURPRISE! It can never be done.
In fact, if any simgle element was any larger than 0.55 in this example, your goal will never be doable. So if one element is as large as even 0.55+eps, it is mathematically impossible to find 9 numbers, all of which are between 0.05 and 0.9, such that the sum is 0.45-eps.
Next, suppose one element was even as large as 0.5? Just one element that large?
Now the other 9 elements must all be very close to 0.05. What is the probability of that event? Not surprisingly, it is pretty darn small. I can compute the actual probability of such an event to happen if you need. Being too lazy to think at this time of day...
X = randfixedsum(10,10000000,1,0,0.9);
sum(max(X) >= 0.5)
ans =
195844
So 1.96e5 such events in 1e7. A little under 2% of the time. As expected, a rare event, and that is EXACTLY as it should be.
You ask for dispersion. But you don't seem to understand what dispersion means or what it implies in this context.
If I look at the distribution of the maximum of all 10 elements, I get something that is actually pretty reasonable.
X = randfixedsum(10,10000,1,0.05,0.9);
Min 0.1207
1.0% 0.1342
5.0% 0.1445
10.0% 0.1524
25.0% 0.1674
50.0% 0.1884
75.0% 0.2167
90.0% 0.2503
95.0% 0.2738
99.0% 0.3143
Max 0.4039
Most of the time, we get a maximum value that is pretty small in context. And that is because the sample truly is uniformly distributed around the constraint space. One point in that space is equally as likely to arise as any other point. But that does NOT mean that the maximum is ever likely to be larger than 0.55. In fact, that would be an impossible event.
Suppose instead, that we change the way things were generated? Now, instead of requiring that the min be 0.05. Just make it 0. How do the statistics change?
X = randfixedsum(10,10000,1,0,0.9);
Min 0.1395
1.0% 0.1681
5.0% 0.1902
10.0% 0.205
25.0% 0.2353
50.0% 0.2784
75.0% 0.3359
90.0% 0.401
95.0% 0.4492
99.0% 0.5479
Max 0.8123
As you now see, the maximum element is now considerably larger. In the same size sample, I once got something as large as 0.8123. There is now much more room for those "dispersed" events to arise.
  1 Commento
J AI
J AI il 28 Giu 2020
Modificato: J AI il 28 Giu 2020
Oh wow. really appreciate your detailed painstaking explanation. I can see how I got the whole thing messed up with my requirements. Thank you so much for clearing it up with such clarity.

Accedi per commentare.

Più risposte (0)

Categorie

Scopri di più su Creating and Concatenating Matrices in Help Center e File Exchange

Prodotti


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by