MATLAB Answers

1

Understanding bayesopt: 1: Why is the same point tested more than once? 2: Understanding acquisition function

Asked by Vicki Taasti on 18 Jan 2019
Latest activity Edited by Don Mathis on 22 Jan 2019
I am using bayesopt for a 2D optimization problem, and I have several time come across results where only a small portion of the search space is explored. That is the tested points cluster very densely in some regions while other regions are unexplored. The below figure (leftmost subfigure) shows a example of this. I have a constraint saying ba1>ba2, so half of the search space is correctly unexplored. But in allowed region (Probability of feasibility = 1) there is still unexplored regions as shown by the yellow circles I have drawn in the middle subplot. Inspecting the Acquisition function (right plot) it is seen that it is nonzero (and nearly constant) in the unexplored region.
This problem is rather large and takes days to finish, so I created a very simple example to test the behavior of bayesopt, and in the hope to figure out how to set the parameters (the choice of acquisition function and the exploration rate). Unfortunately that just gave me extra questions which I hope you can help me understand.
My example code is:
ba1=optimizableVariable('ba1',[1,100],'Type','integer');
fun=@(x)opt_fun(x);
intial_ba=table(1);
results=bayesopt(fun,ba1,'IsObjectiveDeterministic',true,...
'MaxObjectiveEvaluations',10,'NumSeedPoints',1,'InitialX',intial_ba);
plot(results,'all')
function res=opt_fun(x)
res=2*x.ba1;
end
It is a very simple linear function, and bayesopt() works perfectly and finds the minimum (being ba1=1) in a few iterations.
However, my concern is that after it finds the minimum it starts to re-evaluate this same value over and over again, even though I have specified that the objective function is deterministic.
My questions are concerning the below figure:
Simplecase.png
1: Why do bayesopt testing the same point again (I forced it to start in ba1=1) and already the next (fourth) point is ba1=1 again.
Additional question:
Why is the acquisition function not zero at ba1=1, when this point is already tested?
Also, why is the acquisition function not increasing in the region where the errorbars increases? Instead it is (almost) zero.
To show the behavior I removed the last two options
results=bayesopt(fun,ba1,'IsObjectiveDeterministic',true,'MaxObjectiveEvaluations',10);
And got the following results:
Output.png
As you see ba1=1 is tested four times in a row.
I have played around with the Acquisition function and Exploration ratio but it still test 1 several times.
I hope you can enlight me. Thanks in advance.

  0 Comments

Sign in to comment.

3 Answers

Answer by Don Mathis on 18 Jan 2019

I'm not completely sure, but I think the lack of exploration in your small example may be a kind of "model overconfidence". The model believes that there is less chance of improvement far from 1 than there may actually be. This may occur because the GP fitting algorithm uses a point estimate for the GP kernel parameters, rather than integrating over the predictions of the model using the full distribution over kernel parameters. In Snoek's paper for example, (see bottom of this page), they advocate integrating over the kernel parameter posterior, but we don't do that, mainly for speed reasons. Integrating would probably make distant regions more uncertain.
Another factor here is that our GP modeling function, fitrgp, does not allow the noise variance to be exactly zero. So we approximate the noise-free case by using a tiny but positive constant value of Sigma. That is why the model still thinks there is a nonzero probability of improvement at X=1.
Finally, in order to avoid overexploitation, we use a version of Bull's method (see reference on same page). This temporarily increases the global kernel amplitude parameter, which is one way to make distant points look more uncertain. It does this a maximum of 5 times until it finds a different "best" point. After 5 tries, it uses the best point. For noisy problems, that has the effect of decreasing the uncertainty at that point, and so eventually it will look elsewhere on future iterations. But in your deterministic case, the tiny variance can't decrease any more.
In your small example, bayesopt is finding the right answer. In your real problem, I assume it is not. Judging from your acquisition function plot, bayesopt seems to be estimating a tiny kernel scale for your problem. Is your problem really deterministic? If so, if it's a highly variable function, you may get better results by telling bayesopt that it's nondeterministic.

  0 Comments

Sign in to comment.


Answer by Vicki Taasti on 21 Jan 2019

Thank you very much for your quick reply, I appreciate this.
I have tested the non-deterministic option, as suggested. And this did give a more evenly distributed results (in the sense that the evaluation points were distributed over the whole feasible search space). However, as expected this also resulted in the Bayesian optimization testing the same points several times (75 iterations and 6 points were non-unique giving effectively only 69 iterations, and some wasted time to redo the 6 points).
Is there anyway this behavior can be changed?

  2 Comments

In addition to what Alan said, another thing you could try is gridsearch using bayesopt. It's not documented, but you can do gridsearch by passing 'grid' as the Acquisition Function and you can specifying the number of grid divisions per dimension. bayesopt will search the grid using uniform random sampling without replacement. Here's an example:
ba1=optimizableVariable('ba1',[1,100],'Type','integer');
ba2=optimizableVariable('ba2',[1,100],'Type','integer');
fun=@(x)opt_fun(x);
intial_ba=table(1);
results=bayesopt(fun,[ba1 ba2],'IsObjectiveDeterministic',true,...
'AcquisitionFunctionName', 'grid',...
'NumGridDivisions',20,...
'MaxObjectiveEvaluations',100,...
'XConstraintFcn',@(x)x.ba1<x.ba2);
plot(results,'all')
function res=opt_fun(x)
res=2*x.ba1 - x.ba2;
end
The value of NumGridDivisions can be a vector of positive integers giving the number of values for each dimension, or a scalar that applies to all dimensions.
And yet another option is the 'random' acquisition function, which does uniform random sampling with replacement. Since you're using integers, this is equivalent to searching your integer grid with replacement.

Sign in to comment.


Answer by Alan Weiss
on 22 Jan 2019

If your objective function is smooth, then you should use fmincon as your optimizer, starting from a variety of initial points to look for a global optimum. If you have Global Optimization Toolbox, then MultiStart can help automate the procedure. See Refine Start Points and Isolated Global Minimum for examples and techniques. The techniques apply even if you do not have Global Optimization Toolbox, you can easily program them yourself.
If the objective function is nonsmooth, you should try patternsearch as your optimizer. See Table for Choosing a Solver.
All this assumes that your objective function is defined for continuous variables. If, as in your example, you have integer constraints, then there are several things you can try, including bayesopt as you have been doing. Another thing to try is exhaustive search on the integer variables, combined with the previous techniques. Integer optimization is a very tough problem in general.
Good luck,
Alan Weiss
MATLAB mathematical toolbox documentation

  0 Comments

Sign in to comment.