how to generate a random vector with fixed sum and bounded elements

Question

0 voti

I want to generate a random N element vector

with a fixed sum:

and each element is bounded as

.

is the fixed sum and

specify upper bound on the

element of the vector

. I would appreciate any guidance.

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Follow Question

Answer 1

Jon il 18 Feb 2020

Apri in MATLAB Online

1 voto

Maybe I am not understanding something, but it seems like your problem is overconstrained.

You can easily make a length n vector whose elements are bounded by d, for example using a uniform distribution

x = rand(N,1).*d % where d is a vector of upper bounds for each element

you can also make a random vector with a specified sum using

x = rand(N,1);
xfixedSum = x/sum(x)*Etotal

But it seems that in general it is not possible to hold both constraints.

12 Commenti
Mostra 10 commenti meno recenti Nascondi 10 commenti meno recenti

Jon il 27 Feb 2020

Modificato: Jon il 27 Feb 2020

Apri in MATLAB Online

So the problem is not necessarily over constrained, it all depends upon your choice of Etotal and d.

A necessary condition to have a solution is for sum(d) >= Etotal, that is if the components are too small there is no way they can sum up to Etotal.

Here is one approach to generating a set of uniformly distributed random vectors that satisfy the constraints.

I just have it here as a script, but you could turn this into a function if that is what you need.

% generate uniformly distributed random vectors,e satifying
% sum(e) = Etotal
% 0 <= e <= d
% use rejection method
numNeeded = 10; % number of random vectors that are needed
Etotal = 6;
d = [1; 4; 3]; % upper bound on components
maxTries = 1e6; % maximum number of attempts to find feasible vectors
N = length(d); % number of components
% make sure that it is possible to find solutions
if sum(d) < Etotal
    error('No solution possible, try increasing components of d')
end
% enter loop that generates trial points until it finds enough
% that satisfy the constraints
% preallocate array to hold vectors satisfying constraints
% each column of E is a vector satisfying the constraints
E = zeros(N,numNeeded); 
count = 0; 
numTries = 0;
while count < numNeeded & numTries < maxTries
    % make random length N-1 vector that satisfies bounds (inequality
    % constraints)
    e = rand(N-1,1).*d(1:N-1); % scale the components so 0<=eTrial<=d
    
    % calculate required last component to satisfy the equality constraint
    % on the sum
    eN  = Etotal - sum(e);
    
    % check if last component satisfies bounds
    if 0 <= eN && eN <= d(N)
        % satisfies constraints, add it to the output
        count = count + 1;
        E(:,count) = [e;eN];
    end
    % increment try counter
    numTries = numTries + 1;
end
% check if enough solutions were found
if count < numNeeded
        error(['maximum tries were exceeded,' ...
        ' try increasing bounds on d or maximum allowed number of tries'])
end
% display result
disp(E)

Jon il 4 Mar 2020

Modificato: Jon il 4 Mar 2020

Apri in MATLAB Online

Hi Saifullah,

This problem has been intriguing me, and I think I developed a nice solution, if you still need it. I've attached some example code with a lot of comments but here's the overview.

We are trying to sample points on the intersection of the hyperplane defined by sum(e) = Etotal and the bounding box defined by 0 < e < d. The idea is to generate only points that lie on the hyperplane and are inside of the bounding box. We do this by first finding a particular solution that is on the hyperplane and inside of the bounding box. Using this as an origin we then generate points that are on the hyperplane using random vectors that are in the null space of [1 1 1 1 ... 1]*e = Etotal. We scale these to ensure that they do not extend beyond the boundaries of the bounding box.

The particular solution e0 is easily found when we recognize that the origin and the maximum corner of the bounding box are on opposite sides of the hyperplane. We can see this by noting that for the origin, the sum([0 0 0 0...0])<=Etotal,and for the maximum bounding box corner, sum([d(1) d(2) d(3) ...])>=Etotal. So the line connecting these two points, which is fully in the interior of the bounding box, must intersect the hyperplane, somewhere inside of the bounding box. The vector, whose tip is the intersection point, is easily solved for as e0 = Etotal/sum(d)*d, this gives the particular solution we need to get started.

Please give the attached example a try, and play around with different bounding boxes and Etotals. I'm not sure if I've covered all edge cases, but it's worked for all of the examples I've tried. Please let me know if you have any questions.

I would guess that I may have reinvented the wheel here, and that this approach has been documented elsewhere but I didn't quickly come across any references which used this algorithm. I'd be interested if you find one.

% generate uniformly distributed random vectors,e satifying
% sum(e) = Etotal
% 0 <= e <= d
% Generate only feasible points using null space bounding box calculation
% Jon Simon 2020-03-04
numNeeded = 10000; % number of random vectors that are needed
Etotal = 7;
d = [1; 0.5; 1; 0.9; 0.25; 1; 0.5; 2]; % upper bound on components
N = length(d); % number of components
% make sure that it is possible to find solutions
if sum(d) < Etotal
    error('No solution possible, try increasing components of d')
end
% find particular solution in the interior of the bounding box
% define vector that extends from origin to maximal corner of bounding box
v = d;
% find intersection of vector with hyperplane where sum(e) = Etotal
e0 = Etotal/sum(d)*v;
% find null space of solutions
A = null(ones(1,N));
% translate coordinates of bounding box
% to a frame where origin is at e0
dU = d - e0;
dL = -e0; % original lower bound is zero
Bbnd = [dL dU]; % box bounds in translated coordinates
% generate random feasible point that are on the hyperplane and in the
% interior of the bounding box
E = zeros(N,numNeeded); % preallocate array to hold vectors
for k = 1:numNeeded
    
    % find random vector in null space
    w = 2*rand(N-1,1)-1; % ranges from -1 to 1
    e = A*w;
    
    % random point will be found in direction of e, but need to determine
    % allowable scaling so that it will be inside of the bounding box
    
    % for each component of e find matrix of scaling factors Alpha, where
    % Alpha(i,j)*e = Bbnd(i,j); where Alpha(i,1)  is the scale factor
    % that would bring the ith component of Alpha(i,1)*e to the ith lower bound
    % and Alpha(i,2) is the scale factor that would bring the ith component
    % of Alpha(i,2)*e to the ith upper bound
    Alpha = Bbnd./e;
    
    % find the allowable range of alpha, note that since 0*e is a point that is
    % on the hyperplane and in the interior of the bounding box, alpha = 0 is always in
    % allowable range so the lower bound on alpha is a negative value and
    % the upper bound is a positive value
    alphaL = max(Alpha(Alpha(:)<=0)); % greatest lower bound on Alpha, use (:) to look at elements as column
    alphaU = min(Alpha(Alpha(:)>=0)); % least upper bound on Alpha
    
    % randomly choose a value of the scale factor that is within the
    % allowable range
    alpha = alphaL + rand*(alphaU-alphaL);
    
    % generate point on hyperplane that is inside of bounding box
    E(:,k) = e0 + alpha*e ; % particular solution plus null space vector;
    
end
% check
colSums = sum(E) % gives column sums, should all equal Etotal
minComp = min(E,[],2) % gives minimum of each component over all of the points in E, should all be >= zero
maxComp = max(E,[],2) % gives maximum component over all of the points in E, should all be <= d
% display sum(d)-Etotal, which gives indication of difficulty of proble
disp({'sum(d)-Etotal = ',sum(d)-Etotal})
% if successful all of these should be true
check1 = all(abs(colSums -Etotal) < 1e-10) % allow some tolerance
check2 = all(minComp>=0)
check3 = all(maxComp<=d)

Jon il 4 Mar 2020

Modificato: Jon il 4 Mar 2020

@John Please look at my last algorithm, that I just posted before your comment. I'm sure it could be further optimized but overall I think it is quite efficient, with no rejection and very few operations.

Saifullah Khalid il 5 Mar 2020

Jon, Thank you very much for all your kind support. I spent some time to understand the solution yesterday, as things were well beyond my mathematical knowhow. Thanks to your tons of comments about every step.Now, I am trying to establish the randomness properties of the generated data and also want to understand the underlying mathn in more formal way. I wish if John or you can kindly comment on these aspects.

Accedi per commentare.

Answer 2

John D'Errico il 4 Mar 2020

Modificato: John D'Errico il 4 Mar 2020

Apri in MATLAB Online

2 voti

If your bounds are the same for all elements, then use randfixedsum. It is amazingly efficient, even for high dimensional problems. The number of dimensions is N in this sort of problem.

For example, to find 3 vectors, of length 10, such that the samples are uniformly distributed over the set of possible solutions, with a sum of 1, lower bound of 0, and upper bound of 1...

X = randfixedsum(10,3,1,0,1)
X =
      0.11264      0.16122       0.1933
     0.090028      0.13915     0.076298
      0.10823      0.02528     0.024352
      0.15859      0.31793     0.028457
     0.062117      0.09639     0.056485
     0.013329     0.051958     0.025118
      0.10416     0.058997       0.3023
     0.010921     0.021139      0.11247
      0.21689    0.0064835     0.040133
      0.12309      0.12146      0.14109
      
sum(X,1)
ans =
            1            1            1

As I said, randfixedsum is quite efficient. The time to find 500 vectors, eah of length 1000 is about as good as you can possibly home for.

timeit(@() randfixedsum(1000,500,1,0,1))
ans =
     0.062201

So only .06 seconds for the call. Find randfixedsum on the file exchange, here:

https://www.mathworks.com/matlabcentral/fileexchange/9700-random-vectors-with-fixed-sum

A problem arises when your bounds are not the same for all elements of the resulting vectors. Or, if the linear combination chosen is not just a simple sum. In these cases, as long as N is not too large, then use randfixedlinear combination.

For exampel, suppose we want to find vectors of length 5, such that the sum is also 1, but where the elements have an upper bound as the vector [.1 .2 .3 .4 .5]?

X = randFixedLinearCombination(3,1,ones(1,5),0,[.1 .2 .3 .4 .5])
X =
     0.034225      0.15026      0.21493      0.27429      0.32629
     0.091852     0.049101      0.27878      0.37448      0.20579
     0.032732     0.086649      0.23589      0.36987      0.27486
     
sum(X,2)
ans =
            1
            1
            1

So the first element lives in [0, 0.1], the 5th element lives in [0, 0.5], etc.

randFixedLinearCombination is not as efficient when the dimension N gets large however.

For example, it is extremely good in a low number of dimensions:

timeit(@() randFixedLinearCombination(3,1,ones(1,10),0,1))
ans =
   0.00069653
>> timeit(@() randFixedLinearCombination(3,1,ones(1,15),0,1))
ans =
    0.0067253
>> timeit(@() randFixedLinearCombination(3,1,ones(1,20),0,1))
ans =
      0.39572

But as I start to push things a bit, say 20 dimensions, things get slower.

Find randFixedLinearCombination here:

https://www.mathworks.com/matlabcentral/fileexchange/49795-randfixedlinearcombination

A virtue of both codes is they assue a result that is uniformly distributed over the domain of interest.

Be very careful however, as many solutions are not so good, especially ones that seem trivial. For example, you might consider the classic rescaling solution, something that someone on the site is always fast to propose.

X = rand(1,3);
X = X./sum(X)
X =
      0.55967    0.0077895      0.43254

It seems REALLY easy to do. And I will admit, it is fast. One problem is, the result is a set of vectors that are not uniformly distributed over the domain of all possible solutions. Worse, they may not satisfy the bound constraints as required. So, if we try plotting the result of 10000 such samples, we would see...

X = rand(10000,3);
X = X./sum(X,2);
plot3(X(:,1),X(:,2),X(:,3),'.')
grid on
box on

The result is not uniformly distributed over that set. As you can see, the samples are considerably less dense in some regions of the implied constraint plane.

randfixedsum and randFixedLinearCombination have no such problem however.

Y = randFixedLinearCombination(10000,1,ones(1,3),0,1);
plot3(Y(:,1),Y(:,2),Y(:,3),'.')
grid on
box on

As I said, another problem is rescaling can mess up the bound constraints. That is never a problem with the two tools I have recommended however.

15 Commenti
Mostra 13 commenti meno recenti Nascondi 13 commenti meno recenti

Jon il 4 Mar 2020

Modificato: Jon il 4 Mar 2020

Apri in MATLAB Online

Hi John,

In my last algorithm, please see above, I find a particular solution inside of the bounding box, and then find random points satsifying the bounding box constraints by adding appropriately scaled random vectors in the null space of [1 1 1 1 .. 1]. I just tried my alogorithm to compare to your example of

randFixedLinearCombination(3,1,ones(1,20),0,1))

which you said took about 0.4 seconds.

I used my code to to find 3, length 20, vectors, each of which sum to 1, and whose components are between 0 and 1, which I think is a a comparable problem. I find it only takes about 0.002 seconds, (on a good laptop) so I don't think it suffers to much from the high dimensionality. Actually I get similar execution times even if the feasible region is much smaller, so for example if I have the same bounds and required the components to sum to 19.

If you're interested to try it out, or to look at the correctness of the algorithm, I would value hearing about your experience. The one subtle point that I'm not sure how to assess is the extent to which the points I generate are uniformly distributed. Intuitively, it seems like they would be, but for really assessing this I am out of my depth. Note however that I don't just use a simple rescaling as you discuss in your post.

John D'Errico il 5 Mar 2020

Apri in MATLAB Online

Jon - Time is very dependent on what is running in the background, of course. In fact, (sheepish admission) I was probably running Python flat out on the side, searching for huge prime numbers from a special family at the same time as I made that test. I'm midway through a 10 hour run right now, and I am often found to be running both MATLAB and Python engines flat out on my system. So my time estimate may have been unthinkingly wrong.

A uniform distribution in a high number of dimensions is difficult to assess. It is far easier to assess when you can plot things, as I did in 3-d. And it much depends on how much you need uniformity. But remember that a strongly non-uniform sample can be a problem, since it will result in parts of the sample space being heavily under-represented. Depending on what you are doing, it may be important, or just a so what? Are you doing a Monte Carlo integration (or something that is essentially equivalent to that)? If so, then non-uniformity can result in a biased estimate. Sometimes, you just want a few sets of numbers that look sort of random, but do need to satisfy some important property.

True uniformity is, as I said, sometimes difficult to predict. For example, we can visualize a reason why the simple rescaling solution fails for the sum==1 is that we sample initially from a unit hypercube using rand, but then project the samples down to a plane. Unfortunately, the plane we project onto is not a "good" choice for uniformity.

xy = rand(1000,2);
plot(xy(:,1),xy(:,2),'.')
hold on
line([0 1],[1 0])
axis equal
axis tight

The projection here is one that projects each point onto the diagonal line. As you can see, there are far fewer points that will yield a result near the ends of the line, than there are points that will land near the center. So now if we look at the distribution of the projected points along that line, it is not even remotely uniform.

I'll redo the sample here to be much larger so we get a better plot. It is simple here just to view the marginal distribution for x only:

xy = rand(1000000,2);
xyc = xy./sum(xy,2);
hist(xyc(:,1),1000)

However, suppose we just want to sample points that lie uniformly on the surface of a unit hyper-sphere in n-dimensions? Now the best scheme really is to do a simple re-scaling of a Gaussian sample.

nsamples = 1000;
n = 3;
X = randn(nsamples,n);
X = X./sqrt(sum(X.^2,2));

This works of course because the multi-variate Gaussian PDF is symmetric around the origin. But many people do not appreciate the difference, nor do they always care. ;-) The nice thing is, this is easy to show, but subtleties in a lengthy code can easily complicate the question.

@Saifullah - Really, we need to know if the difference is worth the time and energy spent to verify if the distribution is uniform. If you are willing to accept something that looks good enough for the task at hand, then there is no need to look further. It is crucial to know the goals of any programming task.

Jon gave you code that does produce a random result. I can fairly easily prove that the result of the code I wrote is uniform in n-dimensions, and I think I recall convincing myself of that for Roger Stafford's code, long ago. But if you want to show that what Jon did is valid in n-dimensions, a good start would be to look at what it produces in 2 or 3 dimensions. Does a low-dimensional case at least appear to be reasonably uniform? You can see the difference in the examples I've given in 3-dimensions. For 1e6 samples from randFixedLinearCombination (or randfixedsum, for that matter) we see a nice flat histogram.

X = randFixedLinearCombination(1000000,1,ones(1,2),0,1);
hist(X(:,1),1000)

So I would start with a simple case to at least give yourself a warm fuzzy feeling. (Mathematics for me is often preceded by a careful use of MATLAB to give myself that warm fuzzy, before I ever invest the energy to do any real thinking.) Does the 2-d or 3-d case result in something that at least looks reasonable? If not, AND that is important to you, then you need to spend the time to use a better method, or to improve the code proposed by Jon.

Jon il 5 Mar 2020

Modificato: Jon il 5 Mar 2020

Apri in MATLAB Online

Hi John,

Thank you so much for your detailed reply.

I did a quick look as you suggested using lower dimensional cases and it would at least initially appear that my results are not uniformly distributed over the portion of the plane that satisfies the constraints .

Here are the results for

numNeeded = 10000; % number of random vectors that are needed
Etotal = 3; % components must sum to this value
d = [2; 0.5; 1;]; % upper bound on components, lower bound is zero for all components

The point seem denser near the particular solution that provides the local origin for generating my samples. (If you look closely you can see a small red * where the particular solution is located.

So it seems that my approach generates points that satisfy the constraints, but maybe does not provide uniform sampling.

I don't know how important this is for Saifullah's application, but in any case is disappointing to me, as I hoped I had found a nice way to produce uniform samples. I'm not sure if the approach could be modifed to improve this or if it is fundamental limitation. I could think about it a little more, but at the moment I don't really have a good grasp on what it is about how I'm generating the points that leads to the non-uniformity. If you had time to look, and had any insights I would certainly be appreciative.

Jon il 6 Mar 2020

Modificato: Jon il 6 Mar 2020

hboxPlaneSampling.m

Here's what I think is an improved variant of my original idea. Here I generate pairs of randomly selected points on boundary of the interstection and then find randomly selected point that is convex combination of these two points. Please try the attached file if you have a chance.

In the earlier approach I first found a particular solution, x0, that was in the interior of the bounding box I would then find a vector, v with a random direction that was in the hyperplane satisfying the sum constraint. I would then randomly scale v (using a uniformly distributed length) so that when added to x0 it would be between x0 and the boundary. I think the problem with this regarding uniformity, is that for directions that were close to the boundary a randomly chosen point will be more likely to be near the origin than for directions that are far from the boundary. So in the end I would have more points close to the origin (x0) in the directions that were closer to the boundary.

In the modified approach, I generate a pair of randomly selected points on the boundary and then pick (using a uniform distribution) a point that is "between" these. Intuitively this seems like it should be better since we sample lines that cross all over the sampled region, some long some short. I think by eye it looks better, but this uniformity concept seems to be quite subtle, so maybe this approach also has shortcomings.

I have a feeling this must be something that mathematicians must have looked at in detail already and I'm probably somewhat blindly trying to reinvent the wheel here. Looking into this literature is a little daunting though, and I'm just looking at the problem because it intrigues me. I'm also not sure how rigorous the original OP needs the results to be and what is good enough. Anyhow I'd certainly be happy to hear your ideas.

Jon il 6 Mar 2020

Modificato: Jon il 6 Mar 2020

Apri in MATLAB Online

hboxPlaneSampling03.m

OK Here is an improved version that eliminates the oversampling of the boundaries. Please give this a try. In the new approach I still generate a random pair of points at the boundary and then select a random point on the line that connects them, but now I make sure the second point is not on the same boundary.

Unfortunately, I think I can see that there may be somewhat of a bias towards having points that are near the boundaries that are closest to the particular solution. When you do a random "spin" to get the first point, it is more likely to point at a boundary that is close to the local origin (the particular solution) as the "included angle" will be bigger for hitting that boundary than one the is far away. The second point, which I don't allow to be on the same boundary is then most likely to be on the next nearest boundary. The point that is generated on the line that connects these. So I think more of the points will be located between the two boundaries that are closest to the local origin. Oh well. Sorry, I guess this problem is really challenging. I probably shouldn' t post any more until/if I really have thought it through. I will leave it for now.

In any case, here are the results using

numNeeded = 10000; % number of random vectors that are needed
Etotal = 3;
d = [2; 0.5; 1;]; % upper bound on components
tic
[E,x0] = hboxPlaneSampling03(numNeeded,d,Etotal);

John D'Errico il 7 Mar 2020

Modificato: John D'Errico il 7 Mar 2020

Ok, lets see first what the general problem requires.

You want to find a uniformly distributed random point subject to a constraint on the sum of those numbers, and the elements of the sum are sampled randomly from within specified ranges, which are not all the same. (A lot easier if the lower and upper bounds are the same, as then Roger's randfixedsum really is the best solution, since it does yield a uniform solution on the domain of interest.)

So first, lets look at the geometry of the problem, in the N-dimensional space of interest. N here is the number of elements in the vector. At best, I'll need to give examples in 2 or 3 dims, since I can't plot more than that, and then I'll hope to provide intuition about what is happening in high dimensions.

You can think about the bound constraints as creating a box in N dimensions. If the bounds are all the same, then the box is just a hyper cube. But with different bounds for each variable, the box converts to a hyper-rectangle. I suppose you could transform the problem so the hyper-rectangle maps into the unit hyper-cube, but then the sum constraint transforms into a more general linear combination of the variables, so nothing really any less complex.

In n-dimensions, a simple linear constraint reduces to a hyper-plane. I would call it a planar (n-1)-manifold embedded in the N-dimensional space. This, in 2 dimensions, the sum constraint reduces to a line. The lower and upper bounds reduce that further into a line segment. Moving up to 3 dimensions, the sum constraint forces all points to fall on a plane in the R^3 space of the elements, so now a 2-manifold, embedded in that R^3 space. I'll plot a few examples, just to give an idea of what is happening. Consider a box with bounds as

0 <= x1 <= 0.5

0.25 <= x2 <= 1

1 <= x3 <= 2

Next, I'll show the rectangular box of interest, combined with the domains of 4 slices through that box, representing 4 different possible sum constraints. (I've used several tools to do this, not all of which are available to others. Sorry about that, but the lack therof is not relevant to this discussion.). The specific sums I chose were 1.5 (purple), 2 (blue), 2.4 (green), and 3 (yellow).

As you should see, each constraint region is planar. They are in fact parallel planes, though that may not be obvious from this viewpoint.

What is pertinent to this discussion is they are polygonal regions, having a variety of numbers of sides, depending on where the plane intersects the 3-d rectangular region. That aspect is important to the discussion. In fact, I would describe the planes as 2-manifolds, embedded in the 3-d space of the box. The boundary polygon of the regions is a closed 1-manifold, a piecewise linear line or polygon. This is something you need to appreciate, because it will be important when we next talk about sum constraints in 4 or 5 or 20 or 100 dimensions.

For the 3-d box shown above, our goal would be to sample the regions displayed uniformly. That in itself is not too difficult. I do it in the code for randFixedLinearCombination. (You may decide the scheme I used in there is of interest. I think the code is reasonable, well documented, and easy enough to read. Note that surprisingly, the code may seem a little out of date, since it uses histc insead of discretize, and bsxfun, instead of the simpler scalar dimension expansions that MATLAB now allows. One thing that is pertinent are my comments in the code where I explicitly explain why the code would be computationally intractable for large numbers of dimensions, certainly on the order of 40 or larger. Of course, now that I look at my own code, I see one aspect where I could have improved the efficiency just a smidge more. Oh well - add that to my list of things to do.)

Regardless, next, suppose we have a problem where there are more or fewer dimensions? In two dimensions the problem is simple. We have a rectangle now, and the sum constraint results in a line segment, thus a bounded 1-manifold, in a 2-d space. The boundaries of the 1-manifold are a pair of points, the ends of the line segment, which we might think of as a 0-manifold. Our problem here reduces to sampling uniformly along the line segment, thus a relatively trivial task.

In this next figure, you can see the box [0,0.5] X [0.25,1]. I've added line segments there to show the sum constraints, corresponding to sums of 0.5 (black), 0.75 (blue), 1 (yellow, with black dots) and 1.25 (green).

Here you should see the lines are indeed parallel to each other. (In theory, they would all lie at 45 degrees in the figure, except the differential axis scaling confuses that issue.)

The important thing to understand is the box lives in 2-dimensions. The sum constraint reduces that to a 1-manifold, thus a line segment, which is bounded at the ends by a pair of points. What matters from the 2-d and 3-d cases is to see that they are related, that we can now move forward to the 4-d rectangular box problem. Sadly though, I cannot plot most of what happens, and you are now forced to rely on intuition.

Thus, suppose we wish to sample inside a 4-d hyper rectangular box forming the bounds, along with a user specified constraint on the sum? The constraint implies a hyper-planar subset in the 4-d space, so a 3-manifold. The boundaries of the planar region will be a 2-manifold. We can think of it as a polyhedron of sorts. It would look roughly like a soccer ball (ok, a football, to those who do not live in the US) with planar 2-d polygonal facets. The exact shape would depend upon the value of the sum, much like the shape of the polygonal region changed as we changed the sum for the 3-d case.

As an example, remember that I cannot plot the result! (Sorry. Well, I could plot it, but my hyper-dimensional monitor is broken, and will be until Starfleet command sends the necessary parts.) Consider the box defined as:

0 <= x1 <= 0.5

0.25 <= x2 <= 1

1 <= x3 <= 2

0.5 <= x4 <= 1.5

Again, a 4-d hyper-rectangle, so impossible to visualize it easily. As the only thing I can do, here is an example of the resulting polyhedron, plotted in the now 3-dimensional subspace of the hyperplane, for the above case where the sum was set to 3.2.

I want you to here visualize planar polygonal facets, with vaying numbers of sides for each facet. Sadly, the tool I used to plot this shows those polygonal facets in a triangulated form. I hope that is not confusing, but I'm feeling too lazy here to remove the triangulations of the facets. The point is to recognize the goal is to now sample the interior of that volume in a uniform way.

The plot above is about the limit of where I can go graphically. However, in 20 or 100 dimensions, or generally in N dimensions, the above ideas still hold. The sum constraint reduces the solution space to a planar (N-1)-manifold. The boundaries of the region, thus where the constraint plane crosses the box of interest will be a closed piecewise planar (N-2)-manifold. Yes, it is difficult to visualize, but think of it as a high dimensional "polyhedron" that itself lives in a hyper-planar subspace of the original box. You might call it an N-1 dimensional blob, or potatoe, but one that has flat polytope facets in that subspace. Regardless, that is the set one will need to sample from, the goal being to do so uniformly, so that the density of samples in every part of that subset will be uniform, given a sufficiently large sample size.

I hope that explains the problem under discussion here. The way I chose to solve the problem in randFixedLinearCombination is to triangulate the region of interest, and then sample uniformly from the tessellated sub-domain. (A problem which is not that difficult to solve, which explains the algorithmic decision I chose.) The issue which is relevant here is that my solution will fail when the problem moves into too many dimensions, so Jon wants to solve it using a different, more ad hoc scheme. The problem then becomes to find a scheme that would result in a quasi-uniform sampling of the region of interest, yet also be computationally viable in a high number of dimensions.

Sigh. I'll post this as a comment, and next, I'll try to take a more serious look at what Jon has tried, in context of what I've explained above. At least I can try to explain why what he has done fails to be uniform, and perhaps to offer some vague idea of how it might be improved.

Jon il 9 Mar 2020

Hi John, Thank so much for your really indepth, and beautifully illustrated explanation of the underlying geometry of the problem. For some reason I have become really captivated by this problem, although I don't have any direct application for it. It was really great to find that this morning, as the day before I had spent a somewhat distracted afternoon hiking with my dogs trying to visuallize all of this in higher dimensions. So your discussion was particulary appreciated. Actually up until thinking about all of this, I had always just thought of hyperplanes, as a natural generatlization of planes, and imagined them as infinitetly thin sheets (like a plane in 3d or a line in 2d), but now am realizing that they are in general, in some sense quite "fat", being only one dimension less than the one they are embedded in. I know that I am going off on a tangent here but, it makes me wonder what is really the defining nature of a plane, maybe that it has a direction that is perpindicual to it that is not in the set, or that it divides the overall space into two halves one that is on one "side" of the plane and one that is on the other side of the plane.

I also just looked briefly at the following relevant link:https://mathoverflow.net/questions/76255/random-sampling-a-linearly-constrained-region-in-n-dimensions, and in case you were interested ran down the references that it linked to, some of which were broken and attached them here. The concept of all of those is sampling the bounded region through random walks, which seems quite interesting. I will also say that although I understand the very big picture concept, the details of those papers are well beyond what I could follow.

I also attached an interesting one on the hyperbox- hyperplane intersection problem.

I wanted to let you know too, that in my suggested ideas for solving this problem I wasn't in any way suggesting that your approach wouldn't be the way to go as long as the dimensionality was not limiting. In generating some ideas, I very optimistically thought I could come up with a nice clean way to do this that would not suffer too much from the "curse of dimensionallity". I am now quite humbled, by the subtleties of the problem.

Thanks again for all of the time you have put into, this. As I said the topic somehow really grabbed me and your explanations have been really enlightening.

Jon il 9 Mar 2020

Now that I know some better terms to search under, I found this in the MATLAB file exchange from Tim Benham. Haven't had a chance to try it but perhaps it might be interesting. Looks like it uses the random walk approach (actually has options for various specific methods) https://www.mathworks.com/matlabcentral/fileexchange/34208-uniform-distribution-over-a-convex-polytope

John D'Errico il 10 Mar 2020

Jon - I'm glad you have found the ideas here to have been stimulating. There are often a variety of ways to solve any problem, with some schemes better for large or small problems. Sometimes that requires an understanding of widely differing areas of mathematics, seeing how you can bring tools from a variety of domains to bear on the problem at hand. And it seems you can almost always find some new way to tweak an algorithm, getting an extra boost.

If you do come up with a nice code, then don't forget to post it on the file exchange. Spend some time if you can to test the properties of that scheme. Is it uniform, or perhaps just very close? Provably so, or is the algorithm a bit too ad hoc to see any direct proof? How fast is it? Does it have perform especially well in a high number of dimensions, or is there some sort of a rough limit on that?

Accedi per commentare.

Answer 3

Jon il 10 Mar 2020

0 voti

Hi John, In case I have a chance to work on this some more, and for future reference, do you have any advice on quantitative tests that can be made to assess to what extent a collection of high dimensional points can be considered to have come from a uniform distribution.

In other words, if a scheme isn't uniform by construction, like yours, is there a way to quantitavely assess how well it is doing in terms of achieving uniform sampling?

I took a quick look, but couldn't find anything that was quickly accessible in terms of an easily implemented formulae, or even better something already in the file exchange. Thanks

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Accedi per commentare.

how to generate a random vector with fixed sum and bounded elements

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Risposta accettata

12 Commenti
Mostra 10 commenti meno recenti Nascondi 10 commenti meno recenti

Più risposte (2)

15 Commenti
Mostra 13 commenti meno recenti Nascondi 13 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Categorie

Tag

Community Treasure Hunt

how to generate a random vector with fixed sum and bounded elements

0 Commenti Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Risposta accettata

12 Commenti Mostra 10 commenti meno recenti Nascondi 10 commenti meno recenti

Più risposte (2)

15 Commenti Mostra 13 commenti meno recenti Nascondi 13 commenti meno recenti

0 Commenti Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Categorie

Tag

Vedere anche

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

12 Commenti
Mostra 10 commenti meno recenti Nascondi 10 commenti meno recenti

15 Commenti
Mostra 13 commenti meno recenti Nascondi 13 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti