Solve Nonlinear Problem with Many Variables
This example shows how to handle a large number of variables in a nonlinear problem. In general, for this type of problem:
Use the Low-memory BFGS (LBFGS) Hessian approximation. This option is available in the default
fminuncandfminconalgorithms.If you have an explicit gradient, you can also use a finite-difference Hessian and the
'cg'subproblem algorithm.If you have an explicit Hessian, formulate the Hessian as sparse.
Although not part of this example, if you have a structured problem and can evaluate the product of the Hessian with an arbitrary vector, try using a Hessian multiply function. See Minimization with Dense Structured Hessian, Linear Equalities.
The example uses the hfminunc0obj helper function shown at the end of this example for the general nonlinear solvers fminunc and fmincon. This function is an N-dimensional generalization of Rosenbrock's function, a difficult function to minimize numerically. The minimum value of 0 is attained at the unique point x = ones(N,1).
The function is an explicit sum of squares. Therefore, the example also shows the efficiency of using a least-squares solver. For the least-squares solver lsqnonlin, the example uses the hlsqnonlin0obj helper function shown at the end of this example as a vector objective function that is equivalent to the hfminunc0obj function.
Problem Setup
Set the problem to use the hfminunc0obj objective function for 1000 variables. Set the initial point x0 to –2 for each variable.
fun = @hfminunc0obj; N = 1e3; x0 = -2*ones(N,1);
For the initial option, specify no display and no limit to the number of function evaluations or iterations.
options = optimoptions("fminunc",Display="none",... MaxFunctionEvaluations=Inf,MaxIterations=Inf);
Set up a table to hold the data. Specify labels for the eight solver runs, and define places to collect the run time, returned function value, exit flag, number of iterations, and time per iteration.
ExperimentLabels = ["BFGS_NoGrad", "LBFGS_NoGrad",... "BFGS_Grad", "LBFGS_Grad", "Analytic", "fin-diff-grads",... "LSQ_NoJacob", "LSQ_Jacob"]; timetable = table('Size',[8 5],'VariableTypes',["double" "double" "double" "double" "double"],... 'VariableNames',["Time" "Fval" "Eflag" "Iters" "TimePerIter"],... 'RowNames',ExperimentLabels);
The following code sections create the appropriate options for each solver run, and collect the output directly into the table whenever possible.
BFGS Hessian Approximation, No Gradient
opts = options; opts.HessianApproximation = 'bfgs'; opts.SpecifyObjectiveGradient = false; overallTime = tic; [~,timetable.Fval("BFGS_NoGrad"),timetable.Eflag("BFGS_NoGrad"),output] =... fminunc(fun, x0, opts); timetable.Time("BFGS_NoGrad") = toc(overallTime); timetable.Iters("BFGS_NoGrad") = output.iterations; timetable.TimePerIter("BFGS_NoGrad") =... timetable.Time("BFGS_NoGrad")/timetable.Iters("BFGS_NoGrad");
LBFGS Hessian Approximation, No Gradient
opts = options; opts.HessianApproximation = 'lbfgs'; opts.SpecifyObjectiveGradient = false; overallTime = tic; [~,timetable.Fval("LBFGS_NoGrad"),timetable.Eflag("LBFGS_NoGrad"),output] =... fminunc(fun, x0, opts); timetable.Time("LBFGS_NoGrad") = toc(overallTime); timetable.Iters("LBFGS_NoGrad") = output.iterations; timetable.TimePerIter("LBFGS_NoGrad") =... timetable.Time("LBFGS_NoGrad")/timetable.Iters("LBFGS_NoGrad");
BFGS with Gradient
opts = options; opts.HessianApproximation = 'bfgs'; opts.SpecifyObjectiveGradient = true; overallTime = tic; [~,timetable.Fval("BFGS_Grad"),timetable.Eflag("BFGS_Grad"),output] =... fminunc(fun, x0, opts); timetable.Time("BFGS_Grad") = toc(overallTime); timetable.Iters("BFGS_Grad") = output.iterations; timetable.TimePerIter("BFGS_Grad") =... timetable.Time("BFGS_Grad")/timetable.Iters("BFGS_Grad");
LBFGS with Gradient
opts = options; opts.HessianApproximation = 'lbfgs'; opts.SpecifyObjectiveGradient = true; overallTime = tic; [~,timetable.Fval("LBFGS_Grad"),timetable.Eflag("LBFGS_Grad"),output] =... fminunc(fun, x0, opts); timetable.Time("LBFGS_Grad") = toc(overallTime); timetable.Iters("LBFGS_Grad") = output.iterations; timetable.TimePerIter("LBFGS_Grad") =... timetable.Time("LBFGS_Grad")/timetable.Iters("LBFGS_Grad");
Analytic Hessian, 'trust-region' Algorithm
opts = options; opts.Algorithm = 'trust-region'; opts.SpecifyObjectiveGradient = true; opts.HessianFcn = "objective"; overallTime = tic; [~,timetable.Fval("Analytic"),timetable.Eflag("Analytic"),output] =... fminunc(fun, x0, opts); timetable.Time("Analytic") = toc(overallTime); timetable.Iters("Analytic") = output.iterations; timetable.TimePerIter("Analytic") =... timetable.Time("Analytic")/timetable.Iters("Analytic");
Finite-Difference Hessian with Gradient, fmincon Solver
opts = optimoptions("fmincon","SpecifyObjectiveGradient",true,... "Display","none","HessianApproximation","finite-difference",... SubproblemAlgorithm="cg",MaxFunctionEvaluations=Inf,MaxIterations=Inf); overallTime = tic; [~,timetable.Fval("fin-diff-grads"),timetable.Eflag("fin-diff-grads"),output] =... fmincon(fun, x0, [],[],[],[],[],[],[],opts); timetable.Time("fin-diff-grads") = toc(overallTime); timetable.Iters("fin-diff-grads") = output.iterations; timetable.TimePerIter("fin-diff-grads") =... timetable.Time("fin-diff-grads")/timetable.Iters("fin-diff-grads");
Least Squares, No Jacobian
lsqopts = optimoptions("lsqnonlin","Display","none",... "MaxFunctionEvaluations",Inf,"MaxIterations",Inf); fun = @hlsqnonlin0obj; overallTime = tic; [~,timetable.Fval("LSQ_NoJacob"),~,timetable.Eflag("LSQ_NoJacob"),output] =... lsqnonlin(fun, x0, [],[],lsqopts); timetable.Time("LSQ_NoJacob") = toc(overallTime); timetable.Iters("LSQ_NoJacob") = output.iterations; timetable.TimePerIter("LSQ_NoJacob") =... timetable.Time("LSQ_NoJacob")/timetable.Iters("LSQ_NoJacob");
Least Squares with Jacobian
lsqopts.SpecifyObjectiveGradient = true; overallTime = tic; [~,timetable.Fval("LSQ_Jacob"),~,timetable.Eflag("LSQ_Jacob"),output] =... lsqnonlin(fun, x0, [],[],lsqopts); timetable.Time("LSQ_Jacob") = toc(overallTime); timetable.Iters("LSQ_Jacob") = output.iterations; timetable.TimePerIter("LSQ_Jacob") =... timetable.Time("LSQ_Jacob")/timetable.Iters("LSQ_Jacob");
Examine Results
disp(timetable)
Time Fval Eflag Iters TimePerIter
______ __________ _____ _____ ___________
BFGS_NoGrad 110.44 5.0083e-08 1 7137 0.015475
LBFGS_NoGrad 53.143 2.476e-07 1 4902 0.010841
BFGS_Grad 35.491 2.9865e-08 1 7105 0.0049952
LBFGS_Grad 1.2056 9.7505e-08 1 4907 0.0002457
Analytic 7.0991 1.671e-10 3 2301 0.0030852
fin-diff-grads 5.217 1.1422e-15 1 1382 0.003775
LSQ_NoJacob 94.708 3.7969e-25 1 1664 0.056916
LSQ_Jacob 6.5225 3.0056e-25 1 1664 0.0039197
The timing results show the following:
For this problem, the LBFGS Hessian approximation with gradients is the fastest by far.
The next fastest solver runs are
fminconwith a finite difference of gradients Hessian, trust-regionfminuncwith analytic gradient and Hessian, andlsqnonlinwith analytic Jacobian.The
fminuncBFGS algorithm without gradient has similar speed to thelsqnonlinsolver without Jacobian. Note thatlsqnonlinrequires many fewer iterations thanfminuncfor this problem, but each iteration takes much longer.Derivatives make a large difference in speed for all solvers.
Helper Functions
The following code creates the hfminunc0obj helper function.
function [f,G,H] = hfminunc0obj(x) % Rosenbrock function in N dimensions N = numel(x); xx = x(1:N-1); xx_plus = x(2:N); f_vec = 100*(xx.^2 - xx_plus).^2 + (xx - 1).^2; f = sum(f_vec); if (nargout >= 2) % Gradient G = zeros(N,1); for k = 1:N if (k == 1) G(k) = 2*(x(k)-1) + 400*x(k)*(x(k)^2-x(k+1)); elseif (k == N) G(k) = 200*x(k) - 200*x(k-1)^2; else G(k) = 202*x(k) - 200*x(k-1)^2 - 400*x(k)*(x(k+1) - x(k)^2) - 2; end end if nargout >= 3 % Hessian H = spalloc(N,N,3*N); for i = 2:(N-1) H(i,i) = 202 + 1200*x(i)^2 - 400*x(i+1); H(i,i-1) = -400*x(i-1); H(i,i+1) = -400*x(i); end H(1,1) = 2 + 1200*x(1)^2 - 400*x(2); H(1,2) = -400*x(1); H(N,N) = 200; H(N,N-1) = -400*x(N-1); end end end
The following code creates the hlsqnonlin0obj helper function.
function [f,G] = hlsqnonlin0obj(x) % Rosenbrock function in N dimensions N = numel(x); xx = x(1:N-1); xx_plus = x(2:N); f_vec = [10*(xx.^2 - xx_plus), (xx - 1)]; f = reshape(f_vec',[],1); % Vector of length 2*(N-1) % Jacobian if (nargout >= 2) G = spalloc(2*(N-1),N,3*N); % Jacobian size 2*(N-1)-by-N with 3*N nonzeros for k = 1:(N-1) G(2*k-1,k) = 10*2*x(k); G(2*k-1,k+1) = -10; G(2*k,k) = 1; end end end