Why does Pattern Search take so long?

Hello,
Could anyone please help me figure out the following?
This is the first time I have tried the pattern search solver for my optimization problem with 29 unknowns. The problem only has lower and upper bounds. I have successfully worked with a number of local optimizers like lsqnonlin on the same problem.
I wonder why pattern search took a tremendous amount of time (compared with local solvers) for the same number of function evaluations (100,000 in this case) as with local solvers. Please look at the options I am using:
opts = psoptimset( 'Cache','on', 'CompletePoll','on', 'InitialMeshSize',0.1, ...
'MaxIter',1e4, 'MaxFunEvals',1e5, 'ScaleMesh','off' );
When I run optimization (please see below), it hits the maximum number of function evaluations allowed. This in itself might not be good, but the issue is the overall time being more than 4,000 seconds:
tic; [param, fval, exitflag, output] = patternsearch( fun, guess_new, [], [], [], [], ...
zeros(1,29), ones(1,29), [], opts ); toc
Maximum number of function evaluations exceeded: increase options.MaxFunEvals.
Elapsed time is 4131.745766 seconds.
Whenever I simply run the same function this many times (100,000), it only takes around 800 seconds. Similarly, when passing the same function to fmincon or lsqnonlin, it takes 800-900 seconds for 100,000 function counts.
Besides, from my point of view, the default pattern, which is 'GPS Positive basis 2N', should take 58 function counts at each iteration: this is due to having 29 unknown as well as using 'on' for 'CompletePoll'. If I limit MaxFunEvals to 100,000, the number of iterations taken before hitting the limit and stopping the optimization should be (100000 / 58) = 1724. However, it can be seen from the output structure that the actual iteration number is greater than 1724:
output
output =
function: [function_handle]
problemtype: 'boundconstraints'
pollmethod: 'gpspositivebasis2n'
searchmethod: []
iterations: 1948
funccount: 100000
meshsize: 3.9063e-04
maxconstraint: 0
message: 'Maximum number of function evaluations exceeded: increase options.MaxFunEvals.'
This, together with the timing issue, makes me think how pattern search worked in this particular case. I would greatly appreciate any help.
Thank you in advance,
Igor.

 Risposta accettata

Alan Weiss
Alan Weiss il 6 Gen 2014
Modificato: Alan Weiss il 6 Gen 2014
Perhaps the Cache option should be 'off'. Please let us know if that makes a difference.
If the problem is suitable for lsqnonlin or fmincon, then those solvers will undoubtedly be more efficient. As you probably know, patternsearch is less efficient than they are. patternsearch is recommended for nonsmooth functions, while Optimization Toolbox solvers are for smooth functions.
Alan Weiss
MATLAB mathematical toolbox documentation

9 Commenti

Igor
Igor il 6 Gen 2014
Modificato: Igor il 6 Gen 2014
Hello Alan,
You are right. Setting 'Cache' to 'off' resolved the issue. Thank you very much. Since the main problem pointed out in the title is gone, I am clicking "Accept this answer".
This being said, would you please be so kind to advise me on a related problem without me having to create a new thread ?
You are certainly right that lsqnonlin or fmincon are way more efficient than patternsearch when optimizing my 29 unknowns. They relatively quickly (around 200-300 seconds on my PC) converge to a good solution I aim to get. This is without adding noise to the objective.
The situation is completely different when I add some Gaussian noise with the standard deviation corresponding to what we would see in reality for the data I am working with. In this case, I cannot accept solutions produced by fmincon and lsqnonlin. This is why I turned to global optimizers. I have read in some papers that devirate-free methods might tolerate noise much better.
However, my experience with patternsearch has not been good so far even without noise, as it quiclky reaches a relatively flat region, and then further improvements become extremely slow. I have tried a bunch of tuning parameters including Latin hypercube as a search method at each iteration. At the same time, MATLAB docs and your comments in other threads make me believe patternsearch is generally more efficient than GA or simulated annealing.
Does this mean that my 29 unknowns and - most probably - a relatively flat problem leave no chance to the available global optimizers? If so, the only option is to reduce the noise to a certain level, but this is a very complex task.
Thank you in advance for any thoughts.
Igor
Alan Weiss
Alan Weiss il 6 Gen 2014
Modificato: Alan Weiss il 6 Gen 2014
If you will indulge me in a bit of a philosophical discourse, there is something strange about optimizing "noisy" functions. There are several issues:
  • Reproducibility. If you evaluate the function twice at the same location, will you get the same answer? In other words, is the "noisy" function really a function, or is it a stochastic sample?
  • Continuity. Is the noisy function discontinuous? For example, is it a smooth function plus an independent random sample from a Gaussian at every point? In that case, the function might not even be bounded in any neighborhood of a point, because taking an infinite number of samples of independent Gaussian variables yields an infinite maximum.
  • Effective computation. Do you really have a way to effectively compute the values of your function, or are your computations an approximation to your true values (I mean, even less accurate than the usual considerations for floating-point arithmetic)?
Now what do all these questions have to do with optimization? Simply put, unless you have a reproducible, somewhat regular, effectively computable function, then there is no way that any optimization algorithm in our toolboxes will satisfy you. The algorithms are all designed under the assumption that your function gives reproducible results, that it has finite values in any neighborhood, and that you are really computing the values of the function, not some proxy for those values.
If you have bounded noise, and a relatively small amount of it, then patternsearch can sometimes be an effective optimizer, as this example shows. But if your problem has a relatively large amount of noise, and the underlying function to optimize is flat, then you can certainly have problems. A random fluctuation can easily give a local minimum, at least as far as the computations give, and yet you can still be far from a real optimum, whatever that might mean in this context.
So yes, patternsearch can tolerate some noise, but it is not a miracle algorithm, able to discount spurious fluctuations. By design it accepts every function evaluation as being error-free, so if it samples a random fluctuation that gives a very low value then it quickly decides that it found the minimum. If you don't like that then you might have your objective function do something else, like take several samples before returning a value.
I'm sorry if I sound negative. I really do believe that patternsearch is a fine solver. But random noise is tough for any optimizer.
Good luck,
Alan Weiss
MATLAB mathematical toolbox documentation
P.S. Yes, patternsearch is generally superior to GA or simulated annealing. I believe that its tuning options are much easier to understand than GA's. You have likely already tried starting with various initial mesh, a variety of start points, and various patternsearch poll algorithms. I have nothing much to add to what you have already investigated. Optimization is a tough problem!
Igor
Igor il 7 Gen 2014
Modificato: Igor il 7 Gen 2014
Hello Alan,
Thank you very much for your time and thorough answer. I do appreciate your involvement.
Let me please try to be more specific about my function.
Continuity. It is continuous and differentiable within the bounds provided and, most likely, outside the bounds. The unconstrained Levenberg-Marquardt finds the global minimum (which is zero) both effectively and efficiently when no noise is added, no matter what the initial guess is.
Effective computation. The function is easily computed for any feasible set of points. All computation is double precision. It is not a simple function, and obtaining its gradients symbolically would be totally impactical. However, the function is deterministic, and the computation time required for its evaluation is low (100 evaluations in a fraction of a second).
Reproducibility. Since the function is deterministic (without noise), everything is reproducible. I get the same nice result from 'lsqnonlin' most of the time when starting from different points. If noise is added, I still get very close results from 'lsqnonlin' for different start points. These results are a bit away from the true optimum without noise, "a bit" meaning that the 29 parameters estimated differ from their true values by more than 2-3% (this is what I would accept).
Amount of noise and its impact on the objective. The noise level added is just 1e-3 per unit or even less. For example, I am using the standard deviation of 30 Volts for the true value of 200,000 Volts. I assume this is generally not much, but my problem is not really well-conditioned. This might have to do with those "flat" regions during the course of optimization.
Still, I do believe the noise added should be tolerated. A simple experiment not involving optimization: I fix 24 out of my 29 unknowns, and create a dense 5D grid from the rest (i.e. the 5 unknowns having the most impact on the objective). I then evaluate the objective for every combination of the 5 unknowns being varied, with and without noise added. The best parameters minimizing the objective are the same in both cases.
This experiment makes me believe that 'lsqnonlin' and 'fmincon' have a hard time approximating gradients in the presence of noise, but the objective itself should not be distorted too much, hence my interest in global optimizers.
Pattern Search is robust, but it is probably high dimensionality that makes it much less efficient on this particular task. It seems it would glide on a plateau forever before falling down to another plateau. And you are right in that I have tried a variety of tuning parameters for the algorithms.
Wrapping up, I just have to conclude that I'll definitely try GA and Simulated Annealing and report here in the unlikely case of their better performance.
Also, Alan, could you please answer two questions below ?
1. Could you please elaborate a little on "If you don't like that then you might have your objective function do something else, like take several samples before returning a value."?
2. Could you say if Differential evolution algorithms will soon be included in the Global Optimization Toolbox? It seems from various publications they are promising enough. I know there is some user-supplied code on MATLAB Central, but what about the official release?
Thank you very much and have a great day,
Igor.
1. All I meant was, if the function is stochastic, you can average the values at a point over several evaluations before returning a value. This can help return a more accurate value for certain kinds of noise (maybe not yours).
2. I am not supposed to comment on any future features. Sorry.
One other thing. There is a bit of documentation on optimizing simulations or solutions to ODEs, and it might be relevant to your "small noise" optimization; in particular, see the section on finite differences.
Good luck,
Alan Weiss
MATLAB mathematical toolbox documentation
Hello Alan,
Thank you again for the reply.
Regarding my function, I might not have been clear enough, and this is why I have been trying to understand your suggestion without much success. Sorry about that.
My objective has the following form for 'lsqnonlin':
Obj = [ ( real(x1) - real(meas1) ), ( imag(x1) - imag(meas1) ), ...
( real(x2) - real(meas2) ), ( imag(x2) - imag(meas2) ), ...
...
( real(xN) - real(measN) ), ( imag(xN) - imag(measN) ) ];
where
[ x1, x2, ... xN ]
is a vector of computed data, and
[ meas1, meas2, ... measN ]
is a vector of measured real-field data that my computed vector should match as closely as possible.
When it comes to 'fmincon' and global optimizers, I supply the norm:
Obj = ( norm(Obj) )^2;
The vector of the measured data can be without noise (in an ideal impractical scenario) or with some Gaussian noise added. In the latter case, the noise is already present in the sampled measured data, so I do not add any noise dymanically during the optimization. This is why it seems to me that evaluating the function twice at the same location would lead to exactly the same result.
I think contamination of the objective caused by the noise might not be too significant. However, as local solvers use derivatives, the gradient is distorted by the contaminated objective as well. I believe the formula below is correct:
Gradient = 2 * Jacobian' * Obj';
Given all this, could you please try to suggest anything else to diminish the bad noise impact, maybe based on your own experience?
Thank you in advance,
Igor.
Igor,
I think you mislead us with your original question. Are you confusing the concept of a continuous model with minimizing an objective function to real life data that will have variability (ie. noise)?
Once you have values for your 29 parameters, your model is continuous and consistent but that does not mean that when you are fitting those 29 parameters with real data that your model is well defined to these types of solvers.
Your problem is not true optimization but a least squares so to speak. Your just trying to use the optimization solvers to do the heavy lifting.
I do the same thing. I have kinetic models with diffusion and chemical pathways with 100's of parameters that I want to fit. My model uses ODE15s to solve for the predicted values which I compare to my measured values.
I have used all of the Matlab solvers and stuff outside (from NAG and Nitro) and if my data is really noisy, I usually bite the bullet with fminsearch or some other Nelder mead simplex type approach for a first guest. Even then, I could be a up a really bad creek with no paddle or a canoe...
For other problems, I get lucky.
Fminsearch costs me in performance, but after 24 hours or so, I end up with a really decent first guess.... Sometimes, if I managed to not screw up my experiments, the first guess is better than decent and makes a little sense.
Hi Marc,
Thank you for your thoughts.
Unlike you, it is impossible for me to change the way the data is collected. I just have to deal with what I have. However, I do know the data quality i.e. how much noise it contains in a specific case. Therefore, I have to make my model tolerate this noise somehow.
Certainly, ideal fitting is impossible in the presence of noise. I would happily get away with obtaining my 29 estimates such that they all lie within +-(2-3)% away from the corresponding true values.
I would also argue that without noise, results outputted by either 'lsqnonlin' or 'fmincon' (does not matter much which one) are almost ideal even though the underlying model used for fitting is not the best one. I do this on purpose. There's a 1-2% disagreement between the true model satisfying denoised data with 100% accuracy and the model I am using. Still, optimization procedures take all that into account and just correct some less important coefficients to make one model fit the other one.
When I add noise, everything becomes much more difficult. I am not using ODEs, just some matrix-based methods like solving for eigenvalues.
Moreover, I have tried 'fminsearch' without success. No wonder though:
"FMINSEARCH is rigorous for 1 unknown variable and empirically successful for small numbers of variables. More than 6 unknowns would be pushing it, though."
(from one of the other threads here on MATLAB Answers).
Finally, the initial guess for my task does not appear to matter much. I could supply a completely true guess i.e. the one satisfying denoised data, but after running optimization with noisy data supplied, the "true" parameters would be turned into some unsatisfactory ones.
Thank you anyway.
Based on your latest description, it seems to me that lsqnonlin accompanied by larger-than-default finite difference steps, and possibly central finite differences, is your best bet.
Good luck,
Alan Weiss
MATLAB mathematical toolbox documentation
Alan,
Thank you. I agree that 'lsqnonlin' is the best one for this task. I've been using central finite differences for a long time. I guess I will have to cope with the noise to try to reduce it somehow. Otherwise, optimization will suffer too much.

Accedi per commentare.

Più risposte (0)

Richiesto:

il 4 Gen 2014

Commentato:

il 12 Gen 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by