## Checking Validity of Gradients or Jacobians

Optimization solvers often compute more quickly and reliably when you provide first derivatives of objective and nonlinear constraint functions. However, you can easily make a mistake in programming these derivatives. To guard against incorrect derivatives, check the output of the programmed derivative against a finite-difference approximation by using the `checkGradients` function.

### Check Gradient of Objective Function

Consider the objective Rastrigin's function

`$\text{Ras}\left(x\right)=20+{x}_{1}^{2}+{x}_{2}^{2}-10\mathrm{cos}\left(2\pi {x}_{1}+2\pi {x}_{2}\right).$`

The following function computes Rastrigin's function and its gradient correctly.

```function [f,g] = ras(x) f = 20 + x(1)^2 + x(2)^2 - 10*cos(2*pi*x(1) + 2*pi*x(2)); if nargout > 1 g(2) = 2*x(2) + 20*pi*sin(2*pi*x(1) + 2*pi*x(2)); g(1) = 2*x(1) + 20*pi*sin(2*pi*x(1) + 2*pi*x(2)); end end```

Check that the `ras` function correctly computes the gradient near a random initial point. To see details of the calculation, set the `Display` name-value parameter to `"on"`.

```rng default x0 = randn(1,2); valid = checkGradients(@ras,x0,Display="on")```
```____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 7.32446e-08. checkGradients successfully passed. ____________________________________________________________ valid = logical 1```

The `badras` function computes the gradient incorrectly.

```function [f,g] = badras(x) f = 20 + x(1)^2 + x(2)^2 - 10*cos(2*pi*x(1) + 2*pi*x(2)); if nargout > 1 g(2) = 2*x(2) + 40*pi*sin(2*pi*x(1) + 2*pi*x(2)); g(1) = 2*x(1) + 40*pi*sin(2*pi*x(1) + 2*pi*x(2)); end end```

`checkGradients` correctly reports that the `badras` function computes gradients incorrectly.

`valid = checkGradients(@badras,x0,Display="on")`
```____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 0.494224. Supplied derivative element (1,1): 92.9841 Finite-difference derivative element (1,1): 47.0291 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-06). ____________________________________________________________ valid = logical 0```

Use the correct gradient to find a local minimum of `ras(x)`.

```rng default x0 = randn(1,2); options = optimoptions("fminunc",SpecifyObjectiveGradient=true); [x,fval,exitflag,output] = fminunc(@ras,x0,options)```
```Local minimum found. Optimization completed because the size of the gradient is less than the value of the optimality tolerance. x = 0.9975 0.9975 fval = 11.9949 exitflag = 1 output = struct with fields: iterations: 9 funcCount: 13 stepsize: 5.7421e-05 lssteplength: 1 firstorderopt: 1.2426e-05 algorithm: 'quasi-newton' message: 'Local minimum found.↵↵Optimization completed because the size of the gradient is less than↵the value of the optimality tolerance.↵↵<stopping criteria details>↵↵Optimization completed: The first-order optimality measure, 2.482746e-07, is less ↵than options.OptimalityTolerance = 1.000000e-06.'```

The solver takes 9 iterations and only 13 function evaluations to reach a local minimum. Without the gradient, the solver takes more function evaluations.

`[x,fval,exitflag,output] = fminunc(@ras,x0) % No options`
```Local minimum found. ... output = struct with fields: iterations: 9 funcCount: 39 ...```

This time, the solver takes 39 function evaluations to reach essentially the same solution as before.

### Check Jacobian of Vector Objective Function

The `lsqnonlin`, `lsqcurvefit`, and `fsolve` functions use vector-valued objective functions. (The objective function that these solvers try to minimize is the sum of squares of the vector-valued objective.) A Jacobian is the first derivative of a vector-valued objective function. The Jacobian `J` of a function `F(x)` with `m` components, where the variable `x` has `n` components, is an `m`-by-`n` matrix. `J(i,j)` is the partial derivative of `Fi(x)` with respect to `xj`.

For example, the `vecfun` code calculates the gradient for a function of four parameters (`a`, `b`, `c`, and `d` in the code) with a number of variables that depends on the input data `t`. The `t` vector corresponds to a set of times, and each time leads to two entries in the objective function `F(x,t)`.

```function [F,J] = vecfun(x,t) t = t(:); % Reshape t to a column vector a = x(1); b = x(2); c = x(3); d = x(4); nt = length(t); F = [a*exp(-b*t) + c,... c*exp(-d*t)]; % numel(F) = 2*nt if nargout > 1 J = zeros(2*nt,4); J(1:nt,1) = exp(-b*t); % First nt components corresponding to a*exp(-b*t) + c J(1:nt,2) = (-t.*a.*exp(-b*t)); J(1:nt,3) = ones(nt,1); J(nt+1:2*nt,3) = exp(-d*t); % Second nt components corresponding to c*exp(-d*t) J(nt+1:2*nt,4) = (-t.*c.*exp(-d*t)); end end```

Validate the gradient calculation in `vecfun`.

```t = [-1/2 1/2 1]; % Three times x = [1 2 3 4]; % Arbitrary x point valid = checkGradients(@(x)vecfun(x,t),x,Display="on")```
```____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.51598e-08. checkGradients successfully passed. ____________________________________________________________ valid = logical 1```

Create artificial response data corresponding to the `vecfun` function at a set of times. Use the Jacobian to help fit parameters of the function to the data.

```t = linspace(1,5); x0 = [1 2 3 4]; rng default x01 = rand(1,4); y = vecfun(x0,t); y = y + 0.1*randn(size(y)); % Add noise to response options = optimoptions("lsqcurvefit",SpecifyObjectiveGradient=true); [x,resnorm,~,exitflag,output] = lsqcurvefit(@vecfun,x01,t,y,[],[],options)```
```Local minimum possible. lsqcurvefit stopped because the final change in the sum of squares relative to its initial value is less than the value of the function tolerance. x = 0.9575 1.4570 2.9894 3.7289 resnorm = 2.2076 exitflag = 3 output = struct with fields: firstorderopt: 1.0824e-04 iterations: 11 funcCount: 12 cgiterations: 0 algorithm: 'trust-region-reflective' stepsize: 0.0111 message: 'Local minimum possible.↵↵lsqcurvefit stopped because the final change in the sum of squares relative to ↵its initial value is less than the value of the function tolerance.↵↵<stopping criteria details>↵↵Optimization stopped because the relative sum of squares (r) is changing↵by less than options.FunctionTolerance = 1.000000e-06.' bestfeasible: [] constrviolation: []```

The returned solution vector, `[0.9575,1.4570,2.9894,3.7289]`, is close to the original vector `[1,2,3,4]`. The solver takes just 12 function evaluations. Without the Jacobian, the solver takes more evaluations.

`[x,resnorm,~,exitflag,output] = lsqcurvefit(@vecfun,x01,t,y) % No options`
```Local minimum possible. ... output = struct with fields: firstorderopt: 1.0825e-04 iterations: 11 funcCount: 60 ...```

This time, the solver takes 60 function evaluations to reach essentially the same solution as before.

### Modify Finite-Difference Options and `checkGradients` Arguments

Sometimes `checkGradients` can give an incorrect result:

• For a correct gradient, `checkGradients` can give a false report of an invalid check. Typically, this occurs because the function has a relatively large second derivative, so the finite-difference estimate is inaccurate. The false report can also occur because the value of the `Tolerance` name-value argument is too small.

• For an incorrect gradient, `checkGradients` can give a false report of a valid check. Typically, this false report occurs because the value of the `Tolerance` name-value argument is too large, or because the results contain a coincidentally matching derivative.

To check the gradient more carefully when you get a false report of an invalid check, change the finite differencing options. For example, `checkGradients` incorrectly reports that the `ras` gradient is incorrect starting from the point `[-2,4]`.

`valid = checkGradients(@ras,[-2,4],Display="on")`
```____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.88497e-06. Supplied derivative element (1,2): 6.21515 Finite-difference derivative element (1,2): 6.21516 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-06). ____________________________________________________________ valid = logical 0```

Set the `FiniteDifferenceType` option to `"central"` and test again.

```opts = optimoptions("fmincon",FiniteDifferenceType="central"); valid = checkGradients(@ras,[-2,4],opts,Display="on")```
```____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.10182e-09. checkGradients successfully passed. ____________________________________________________________ valid = logical 1```

Central finite differences are typically more accurate. In this case, the relative difference between the computed gradient and the central finite-difference estimate is about 1e-9. The default forward finite difference gives a relative difference of about 2e-6.

Check the validity of the gradient by loosening the tolerance from the default 1e-6 to 1e-5.

`valid = checkGradients(@ras,[-2,4],Tolerance=1e-5,Display="on")`
```____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.88497e-06. checkGradients successfully passed. ____________________________________________________________ valid = logical 1```

In this case, `checkGradients` correctly reports that the gradient is valid. The looser tolerance does not cause the `badras` function to pass the check.

`valid = checkGradients(@badras,[-2,4],Tolerance=1e-5,Display="on")`
```____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 0.400823. Supplied derivative element (1,2): 4.4368 Finite-difference derivative element (1,2): 6.21516 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-05). ____________________________________________________________ valid = logical 0```

### Check Derivatives of Nonlinear Constraints

The `unitdisk2` function correctly calculates a constraint function and its gradient to keep `x` inside the disk of radius 1.

```function [c,ceq,gc,gceq] = unitdisk2(x) c = x(1)^2 + x(2)^2 - 1; ceq = [ ]; if nargout > 2 gc = [2*x(1);2*x(2)]; gceq = []; end end```

The `unitdiskb` function incorrectly calculates the gradient of the constraint function.

```function [c,ceq,gc,gceq] = unitdiskb(x) c = x(1)^2 + x(2)^2 - 1; ceq = [ ]; if nargout > 2 gc = [x(1);x(2)]; % Gradient incorrect: off by a factor of 2 gceq = []; end end```

Set the `IsConstraint` name-value argument to `true` and check the gradient at the point ```x0 = randn(1,2)```.

```rng default x0 = randn(1,2); valid = checkGradients(@unitdisk2,x0,IsConstraint=true,Display="on")```
```____________________________________________________________ Nonlinear inequality constraint derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.53459e-08. checkGradients successfully passed. ____________________________________________________________ valid = 1×2 logical array 1 1```

`checkGradients` correctly reports that the gradient of `unitdisk2` is valid.

`valid = checkGradients(@unitdiskb,x0,IsConstraint=true,Display="on")`
```____________________________________________________________ Nonlinear inequality constraint derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1. Supplied derivative element (2,1): 1.8324 Finite-difference derivative element (2,1): 3.66479 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-06). ____________________________________________________________ valid = 1×2 logical array 0 1```

`checkGradients` correctly reports that the gradient of `unitdiskb` is not valid.

You can use the `checkGradients` function to stop a script when a finite-difference approximation does not match the corresponding gradient function.

```rng default x0 = randn(1,2); opts = optimoptions("fmincon",FiniteDifferenceType="central"); assert(checkGradients(@ras,x0,opts,Display="on")) assert(all(checkGradients(@unitdisk2,x0,opts,... IsConstraint=true,Display="on"))) [x,fval] = fmincon(@ras,x0,[],[],[],[],[],[],@unitdisk2)```
```____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 7.52301e-10. checkGradients successfully passed. ____________________________________________________________ ____________________________________________________________ Nonlinear inequality constraint derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1.44672e-11. checkGradients successfully passed. ____________________________________________________________ Local minimum found that satisfies the constraints. Optimization completed because the objective function is non-decreasing in feasible directions, to within the value of the optimality tolerance, and constraints are satisfied to within the value of the constraint tolerance. x = 0.4987 0.4987 fval = 10.4987```

Run the same script with the incorrect `unitdiskb` constraint function. Note that the script stops early.

```rng default x0 = randn(1,2); opts = optimoptions("fmincon",FiniteDifferenceType="central"); assert(checkGradients(@ras,x0,opts,Display="on")) assert(all(checkGradients(@unitdiskb,x0,opts,... IsConstraint=true,Display="on"))) [x,fval] = fmincon(@ras,x0,[],[],[],[],[],[],@unitdiskb)```
```____________________________________________________________ Objective function derivatives: Maximum relative difference between supplied and finite-difference derivatives = 7.52301e-10. checkGradients successfully passed. ____________________________________________________________ ____________________________________________________________ Nonlinear inequality constraint derivatives: Maximum relative difference between supplied and finite-difference derivatives = 1. Supplied derivative element (2,1): 1.8324 Finite-difference derivative element (2,1): 3.66479 checkGradients failed. Supplied derivative and finite-difference approximation are not within 'Tolerance' (1e-06). ____________________________________________________________ Error using assert Assertion failed.```