Documentation

### This is machine translation

Mouseover text to see original. Click the button below to return to the English version of the page.

## Bayesian Optimization Workflow

### What Is Bayesian Optimization?

Optimization, in its most general form, is the process of locating a point that minimizes a real-valued function called the objective function. Bayesian optimization is the name of one such process. Bayesian optimization internally maintains a Gaussian process model of the objective function, and uses objective function evaluations to train the model. One innovation in Bayesian optimization is the use of an acquisition function, which the algorithm uses to determine the next point to evaluate. The acquisition function can balance sampling at points that have low modeled objective functions, and exploring areas that have not yet been modeled well. For details, see Bayesian Optimization Algorithm.

Bayesian optimization is part of Statistics and Machine Learning Toolbox™ because it is well-suited to optimizing hyperparameters of classification and regression algorithms. A hyperparameter is an internal parameter of a classifier or regression function, such as the box constraint of a support vector machine, or the learning rate of a robust classification ensemble. These parameters can strongly affect the performance of a classifier or regressor, and yet it is typically difficult or time-consuming to optimize them. See Bayesian Optimization Characteristics.

Typically, optimizing the hyperparameters means that you try to minimize the cross-validation loss of a classifier or regression.

### Two Ways to Perform Bayesian Optimization

You can perform a Bayesian optimization in two distinct ways:

• Fit function — Include the `OptimizeHyperparameters` name-value pair in many fitting functions to have Bayesian optimization apply automatically. The optimization minimizes cross-validation loss. This way gives you fewer tuning options, but enables you to perform Bayesian optimization most easily. See Bayesian Optimization Using a Fit Function.

• `bayesopt` — Exert the most control over your optimization by calling `bayesopt` directly. This way requires you to write an objective function, which does not have to represent cross-validation loss. See Bayesian Optimization Using bayesopt.

### Bayesian Optimization Using a Fit Function

To minimize the error in a cross-validated response via Bayesian optimization, follow these steps.

1. Choose your classification or regression solver among `fitcdiscr`, `fitcecoc`, `fitcensemble`, `fitckernel`, `fitcknn`, `fitclinear`, `fitcnb`, `fitcsvm`, `fitctree`, `fitrensemble`, `fitrgp`, `fitrkernel`, `fitrlinear`, `fitrsvm`, or `fitrtree`.

2. Decide on the hyperparameters to optimize, and pass them in the `OptimizeHyperparameters` name-value pair. For each fit function, you can choose from a set of hyperparameters. See Eligible Hyperparameters for Fit Functions, or use the `hyperparameters` function, or consult the fit function reference page.

You can pass a cell array of parameter names. You can also set `'auto'` as the `OptimizeHyperparameters` value, which chooses a typical set of hyperparameters to optimize, or `'all'` to optimize all available parameters.

3. For ensemble fit functions `fitcecoc`, `fitcensemble`, and `fitrensemble`, also include parameters of the weak learners in the `OptimizeHyperparameters` cell array.

4. Optionally, create an options structure for the `HyperparameterOptimizationOptions` name-value pair. See Hyperparameter Optimization Options for Fit Functions.

5. Call the fit function with the appropriate name-value pairs.

For examples, see Optimize an SVM Classifier Fit Using Bayesian Optimization and Optimize a Boosted Regression Ensemble. Also, every fit function reference page contains a Bayesian optimization example.

### Bayesian Optimization Using `bayesopt`

To perform a Bayesian optimization using `bayesopt`, follow these steps.

1. Prepare your variables. See Variables for a Bayesian Optimization.

2. Create your objective function. See Bayesian Optimization Objective Functions. If necessary, create constraints, too. See Constraints in Bayesian Optimization.

3. Decide on options, meaning the `bayseopt` `Name,Value` pairs. You are not required to pass any options to `bayesopt` but you typically do, especially when trying to improve a solution.

4. Call `bayesopt`.

5. Examine the solution. You can decide to resume the optimization by using `resume`, or restart the optimization, usually with modified options.

For an example, see Optimize a Cross-Validated SVM Classifier Using bayesopt.

### Bayesian Optimization Characteristics

Bayesian optimization algorithms are best suited to these problem types.

CharacteristicDetails
Low dimension

Bayesian optimization works best in a low number of dimensions, typically 10 or fewer. While Bayesian optimization can solve some problems with a few dozen variables, it is not recommended for dimensions higher than about 50.

Expensive objective

Bayesian optimization is designed for objective functions that are slow to evaluate. It has considerable overhead, typically several seconds for each iteration.

Low accuracy

Bayesian optimization does not necessarily give very accurate results. If you have a deterministic objective function, you can sometimes improve the accuracy by starting a standard optimization algorithm from the `bayesopt` solution.

Global solution

Bayesian optimization is a global technique. Unlike many other algorithms, to search for a global solution you do not have to start the algorithm from various initial points.

Hyperparameters

Bayesian optimization is well-suited to optimizing hyperparameters of another function. A hyperparameter is a parameter that controls the behavior of a function. For example, the `fitcsvm` function fits an SVM model to data. It has hyperparameters `BoxConstraint` and `KernelScale` for its `'rbf'` `KernelFunction`. For an example of Bayesian optimization applied to hyperparameters, see Optimize a Cross-Validated SVM Classifier Using bayesopt.

### Parameters Available for Fit Functions

Eligible Hyperparameters for Fit Functions

Function NameEligible Parameters
`fitcdiscr``Delta`
`Gamma`
`DiscrimType`
`fitcecoc``Coding`
eligible `fitcdiscr` parameters for `'Learners','discriminant'`
eligible `fitckernel` parameters for `'Learners','kernel'`
eligible `fitcknn` parameters for `'Learners','knn'`
eligible `fitclinear` parameters for `'Learners','linear'`
eligible `fitcsvm` parameters for `'Learners','svm'`
eligible `fitctree` parameters for `'Learners','tree'`
`fitcensemble``Method`
`NumLearningCycles`
`LearnRate`
eligible `fitcdiscr` parameters for `'Learners','discriminant'`
eligible `fitcknn` parameters for `'Learners','knn'`
eligible `fitctree` parameters for `'Learners','tree'`
`fitckernel``'Learner'`
`'KernelScale'`
`'Lambda'`
`'NumExpansionDimensions'`
`fitcknn``NumNeighbors`
`Distance`
`DistanceWeight`
`Exponent`
`Standardize`
`fitclinear``Lambda`
`Learner`
`Regularization`
`fitcnb``DistributionNames`
`Width`
`Kernel`
`fitcsvm``BoxConstraint`
`KernelScale`
`KernelFunction`
`PolynomialOrder`
`Standardize`
`fitctree``MinLeafSize`
`MaxNumSplits`
`SplitCriterion`
`NumVariablesToSample`
`fitrensemble``Method`
`NumLearningCycles`
`LearnRate`
eligible `fitrtree` parameters for `'Learners','tree'`:
`MinLeafSize`
`MaxNumSplits`
`NumVariablesToSample`
`fitrgp``Sigma`
`BasisFunction`
`KernelFunction`
`KernelScale`
`Standardize`
`fitrkernel``Learner`
`KernelScale`
`Lambda`
`NumExpansionDimensions`
`Epsilon`
`fitrlinear``Lambda`
`Learner`
`Regularization`
`fitrsvm``BoxConstraint`
`KernelScale`
```Epsilon ```
`KernelFunction`
`PolynomialOrder`
`Standardize`
`fitrtree``MinLeafSize`
`MaxNumSplits`
`NumVariablesToSample`

### Hyperparameter Optimization Options for Fit Functions

When optimizing using a fit function, you have these options available in the `HyperparameterOptimizationOptions` name-value pair. Give the value as a structure. All fields in the structure are optional.

Field NameValuesDefault
`Optimizer`
• `'bayesopt'` — Use Bayesian optimization. Internally, this setting calls `bayesopt`.

• `'gridsearch'` — Use grid search with `NumGridDivisions` values per dimension.

• `'randomsearch'` — Search at random among `MaxObjectiveEvaluations` points.

`'gridsearch'` searches in a random order, using uniform sampling without replacement from the grid. After optimization, you can get a table in grid order by using the command `sortrows(Mdl.HyperparameterOptimizationResults)`.

`'bayesopt'`
`AcquisitionFunctionName`

• `'expected-improvement-per-second-plus'`

• `'expected-improvement'`

• `'expected-improvement-plus'`

• `'expected-improvement-per-second'`

• `'lower-confidence-bound'`

• `'probability-of-improvement'`

Acquisition functions whose names include `per-second` do not yield reproducible results because the optimization depends on the runtime of the objective function. Acquisition functions whose names include `plus` modify their behavior when they are overexploiting an area. For more details, see Acquisition Function Types.

`'expected-improvement-per-second-plus'`
`MaxObjectiveEvaluations`Maximum number of objective function evaluations.`30` for `'bayesopt'` or `'randomsearch'`, and the entire grid for `'gridsearch'`
`MaxTime`

Time limit, specified as a positive real. The time limit is in seconds, as measured by `tic` and `toc`. Run time can exceed `MaxTime` because `MaxTime` does not interrupt function evaluations.

`Inf`
`NumGridDivisions`For `'gridsearch'`, the number of values in each dimension. The value can be a vector of positive integers giving the number of values for each dimension, or a scalar that applies to all dimensions. This field is ignored for categorical variables.`10`
`ShowPlots`Logical value indicating whether to show plots. If `true`, this field plots the best objective function value against the iteration number. If there are one or two optimization parameters, and if `Optimizer` is `'bayesopt'`, then `ShowPlots` also plots a model of the objective function against the parameters.`true`
`SaveIntermediateResults`Logical value indicating whether to save results when `Optimizer` is `'bayesopt'`. If `true`, this field overwrites a workspace variable named `'BayesoptResults'` at each iteration. The variable is a `BayesianOptimization` object.`false`
`Verbose`

Display to the command line.

• `0` — No iterative display

• `1` — Iterative display

• `2` — Iterative display with extra information

For details, see the `bayesopt` `Verbose` name-value pair argument.

`1`
`UseParallel`Logical value indicating whether to run Bayesian optimization in parallel, which requires Parallel Computing Toolbox™. For details, see Parallel Bayesian Optimization.`false`
`Repartition`

Logical value indicating whether to repartition the cross-validation at every iteration. If `false`, the optimizer uses a single partition for the optimization.

`true` usually gives the most robust results because this setting takes partitioning noise into account. However, for good results, `true` requires at least twice as many function evaluations.

`false`
Use no more than one of the following three field names.
`CVPartition`A `cvpartition` object, as created by `cvpartition`.`'Kfold',5` if you do not specify any cross-validation field
`Holdout`A scalar in the range `(0,1)` representing the holdout fraction.
`Kfold`An integer greater than 1.