# evaluate

Evaluates a function approximator object given observation (or observation-action) input data

## Description

## Examples

### Evaluate a Function Approximator Object

This example shows you how to evaluate a function approximator object (that is an actor or a critic). For this example, the function approximator object is a discrete categorical actor, and you evaluate it given some observation data obtaining in return the action probability distribution and the updated network state.

Load the same environment used in Train PG Agent to Balance Cart-Pole System, and obtain the observation and action specifications.

```
env = rlPredefinedEnv('CartPole-Discrete');
obsInfo = getObservationInfo(env)
```

obsInfo = rlNumericSpec with properties: LowerLimit: -Inf UpperLimit: Inf Name: "CartPole States" Description: "x, dx, theta, dtheta" Dimension: [4 1] DataType: "double"

actInfo = getActionInfo(env)

actInfo = rlFiniteSetSpec with properties: Elements: [-10 10] Name: "CartPole Action" Description: [0x0 string] Dimension: [1 1] DataType: "double"

Create a deep neural network for the actor.

actorNetwork = [ sequenceInputLayer(prod(obsInfo.Dimension), ... 'Normalization','none','Name','state') fullyConnectedLayer(8,'Name','fc') reluLayer('Name','relu') lstmLayer(8,'OutputMode','sequence','Name','lstm') fullyConnectedLayer(numel(actInfo.Elements)) ]; actorNetwork = dlnetwork(actorNetwork);

Create a stochastic actor representation for the network.

actor = rlDiscreteCategoricalActor(actorNetwork, ... obsInfo,actInfo,... 'Observation','state');

Use `evaluate`

to return the probability of each of the two possible actions. Note that the type of the returned numbers is `single`

, not `double`

.

[prob,state] = evaluate(actor,{rand(obsInfo.Dimension)}); prob{1}

`ans = `*2x1 single column vector*
0.4847
0.5153

Since a recurrent neural network is used for the actor, the second output argument, representing the updated state of the neural network, is not empty. In this case, it contains the updated (cell and hidden) states for the eight units of the `lstm`

layer used in the network.

state{:}

`ans = `*8x1 single column vector*
-0.0833
0.0619
-0.0066
-0.0651
0.0714
-0.0957
0.0614
-0.0326

`ans = `*8x1 single column vector*
-0.1367
0.1142
-0.0158
-0.1820
0.1305
-0.1779
0.0947
-0.0833

You can use `getState`

and `setState`

to extract and set the current state of the actor.

getState(actor)

`ans=`*2×1 cell array*
{8x1 single}
{8x1 single}

actor = setState(actor, ... {-0.01*single(rand(8,1)), ... 0.01*single(rand(8,1))});

You can obtain action probabilities and updated states for a batch of observations. For example, use a batch of 5 independent observations.

obsBatch = reshape(1:20,4,1,5,1); [prob,state] = evaluate(actor,{obsBatch})

`prob = `*1x1 cell array*
{2x5 single}

`state=`*2×1 cell array*
{8x5 single}
{8x5 single}

The output arguments contain action probabilities and updated states for each observation in the batch.

Note that the actor treats observation data along the batch length dimension independently, not sequentially.

prob{1}

`ans = `*2x5 single matrix*
0.5187 0.5869 0.6048 0.6124 0.6155
0.4813 0.4131 0.3952 0.3876 0.3845

prob = evaluate(actor,{obsBatch(:,:,[5 4 3 1 2])}); prob{1}

`ans = `*2x5 single matrix*
0.6155 0.6124 0.6048 0.5187 0.5869
0.3845 0.3876 0.3952 0.4813 0.4131

To evaluate the actor using sequential observations, use the sequence length (time) dimension. For example, obtain action probabilities for 5 independent sequences each one consisting of `9`

sequential observations.

```
[prob,state] = evaluate(actor, ...
{rand([obsInfo.Dimension 5 9])})
```

`prob = `*1x1 cell array*
{2x5x9 single}

`state=`*2×1 cell array*
{8x5 single}
{8x5 single}

The first output argument contains a vector of two probabilities (first dimension) for each element of the observation batch (second dimension) and for each time element of the sequence length (third dimension).

The second output argument contains two vectors of final states for each observation batch (that is, the network maintains a separate state history for each observation batch).

Display the probability of the second action, after the seventh sequential observation in the fourth independent batch.

prob{1}(2,4,7)

`ans = `*single*
0.5675

For more information on input and output format for recurrent neural networks, see the Algorithms section of `lstmLayer`

.

## Input Arguments

`fcnAppx`

— Function approximator object

`rlValueFunction`

object | `rlQValueFunction`

object | `rlVectorQValueFunction`

object | `rlDiscreteCategoricalActor`

object | `rlContinuousDeterministicActor`

object | `rlContinuousGaussianActor`

object | `rlContinuousDeterministicTransitionFunction`

object | `rlContinuousGaussianTransitionFunction`

object | `rlContinuousDeterministicRewardFunction`

object | `rlContinuousGaussianRewardFunction`

object | `rlIsDoneFunction`

object

Function approximator object, specified as an:

`rlValueFunction`

object,`rlQValueFunction`

object,`rlVectorQValueFunction`

object,`rlDiscreteCategoricalActor`

object,`rlContinuousDeterministicActor`

object,`rlContinuousGaussianActor`

object,`rlIsDoneFunction`

object.

`inData`

— Input data for the function approximator

cell array

Input data for the function approximator, specified as a cell array with as many
elements as the number of input channels of `fcnAppx`

. In the
following section, the number of observation channels is indicated by
*N _{O}*.

If

`fcnAppx`

is an`rlQValueFunction`

, an`rlContinuousDeterministicTransitionFunction`

or an`rlContinuousGaussianTransitionFunction`

object, then each of the first*N*elements of_{O}`inData`

must be a matrix representing the current observation from the corresponding observation channel. They must be followed by a final matrix representing the action.If

`fcnAppx`

is a function approximator object representing an actor or critic (but not an`rlQValueFunction`

object),`inData`

must contain*N*elements, each one being a matrix representing the current observation from the corresponding observation channel._{O}If

`fcnAppx`

is an`rlContinuousDeterministicRewardFunction`

, an`rlContinuousGaussianRewardFunction`

, or an`rlIsDoneFunction`

object, then each of the first*N*elements of_{O}`inData`

must be a matrix representing the current observation from the corresponding observation channel. They must be followed by a matrix representing the action, and finally by*N*elements, each one being a matrix representing the next observation from the corresponding observation channel._{O}

Each element of `inData`

must be a matrix of dimension
*M _{C}*-by-

*L*-by-

_{B}*L*, where:

_{S}*M*corresponds to the dimensions of the associated input channel._{C}*L*is the batch size. To specify a single observation, set_{B}*L*= 1. To specify a batch of (independent) inputs, specify_{B}*L*> 1. If_{B}`inData`

has multiple elements, then*L*must be the same for all elements of_{B}`inData`

.*L*specifies the sequence length (length of the sequence of inputs along the time dimension) for recurrent neural network. If_{S}`fcnAppx`

does not use a recurrent neural network (which is the case of environment function approximators, as they do not support recurrent neural networks), then*L*= 1. If_{S}`inData`

has multiple elements, then*L*must be the same for all elements of_{S}`inData`

.

For more information on input and output formats for recurrent neural networks, see
the Algorithms section of `lstmLayer`

.

**Example: **`{rand(8,3,64,1),rand(4,1,64,1),rand(2,1,64,1)}`

## Output Arguments

`outData`

— Output data from the evaluation of the function approximator object

cell array

Output data from the evaluation of the function approximator object, returned as a
cell array. In the following section, the number of observation channels is indicated by
*N _{O}*.

If

`fcnAppx`

is an`rlContinuousDeterministicTransitionFunction`

object, then`outData`

contains*N*matrices, each one representing the predicted observation from the corresponding observation channel._{O}if

`fcnAppx`

is an`rlContinuousGaussianTransitionFunction`

object, then each of the first*N*elements of_{O}`outData`

is a matrix representing the mean value of the predicted observation for the corresponding observation channel. Each of the following*N*elements of_{O}`outData`

is a matrix representing the standard deviation of the predicted observation for the corresponding observation channel.if

`fcnAppx`

is an`rlContinuousGaussianActor`

object, then`outData`

is a two-element cell array, in which the two elements are matrices representing the mean value and standard deviation of the action, respectively.if

`fcnAppx`

is an`rlDiscreteCategoricalActor`

object, then`outData`

is a single-element cell array, containing a matrix with the probabilities of each action.if

`fcnAppx`

is an`rlContinuousDeterministicActor`

object, then`outData`

is a single-element cell array, containing a matrix with the action.if

`fcnAppx`

is an`rlVectorQValueFunction`

object, then`outData`

is a single-element cell array, containing a matrix with the values of each possible action.if

`fcnAppx`

is an`rlQValueFunction`

object, then`outData`

is a single-element cell array, containing a matrix with the value of the action.if

`fcnAppx`

is an`rlValueFunction`

object, then`outData`

is a single-element cell array, containing a matrix with the value of the current observation.if

`fcnAppx`

is an`rlContinuousDeterministicRewardFunction`

object, then`outData`

is a single-element cell array, containing a matrix with the reward predicted for the current observation, the action, and the next observation following the action.if

`fcnAppx`

is an`rlContinuousGaussianRewardFunction`

object, then`outData`

is a two-elements cell array, in which the two elements are matrices representing the mean value and standard deviation, respectively, of the reward predicted for the current observation, the action, and the next observation following the action.if

`fcnAppx`

is an`rlIsDoneFunction`

object, then`outData`

is a single-element cell array, containing a vector with the probabilities of the predicted termination status being`0`

(no termination predicted) or`1`

(termination predicted), respectively. These probabilities depend in the general case on the values of the observation, action, and next observation following the action.

Each element of `outData`

is a matrix of dimensions
*D*-by-*L _{B}*-by-

*L*, where:

_{S}*D*is the vector of dimensions of the corresponding output channel of`fcnAppx`

. Depending on the type of approximator function, this channel can carry a predicted observation (or its mean value or standard deviation), an action (or its mean value or standard deviation), the value (or values) of an observation (or observation-action couple), a predicted reward, or a predicted termination status.*L*is the batch size (length of a batch of independent inputs)._{B}*L*is the sequence length (length of the sequence of inputs along the time dimension) for a recurrent neural network. If_{S}`fcnAppx`

does not use a recurrent neural network (which is the case of environment function approximators, as they do not support recurrent neural networks), then*L*= 1._{S}

**Note**

If `fcnAppx`

is a critic, then `evaluate`

behaves identically to `getValue`

except that it returns results inside a single-cell array. If
`fcnAppx`

is an `rlContinuousDeterministicActor`

actor, then `evaluate`

behaves identically to `getAction`

. If
`fcnAppx`

is a stochastic actor such as an `rlDiscreteCategoricalActor`

or `rlContinuousGaussianActor`

then `evaluate`

returns the
action probability distribution, while `getAction`

returns a sample action. Specifically, for an `rlDiscreteCategoricalActor`

actor object, `evaluate`

returns the probability of each possible actions. For an `rlContinuousGaussianActor`

actor object, `evaluate`

returns the mean and standard deviation of the Gaussian distribution. For this kind of
actors, see also the note in `getAction`

regarding the enforcement of constraints set by the action specification.

**Note**

If `fcnAppx`

is an `rlContinuousDeterministicRewardFunction`

object, then
`evaluate`

behaves identically to `predict`

except that it returns results inside a single-cell array. If
`fcnAppx`

is an `rlContinuousDeterministicTransitionFunction`

object, then
`evaluate`

behaves identically to `predict`

. If
`fcnAppx`

is an `rlContinuousGaussianTransitionFunction`

then `evaluate`

returns the mean value and standard deviation the observation probability
distribution, while `predict`

returns an observation sampled from this distribution. Similarly, for an `rlContinuousGaussianRewardFunction`

object, `evaluate`

returns the mean value and standard deviation the reward probability distribution,
while `predict`

returns a reward sampled from this distribution. Finally, if
`fcnAppx`

is an `rlIsDoneFunction`

object, then `evaluate`

returns the
probabilities of the termination status being false or true, respectively, while
`predict`

returns a predicted termination status sampled with
these probabilities.

`state`

— Updated state of the function approximator object

cell array

Next state of the function approximator object, returned as a cell array. If
`fcnAppx`

does not use a recurrent neural network (which is the
case of environment function approximators), then `state`

is an empty
cell array.

You can set the state of the representation to `state`

using the
`setState`

function. For example:

critic = setState(critic,state);

## Version History

**Introduced in R2022a**

## See Also

`getValue`

| `getAction`

| `getMaxQValue`

| `rlValueFunction`

| `rlQValueFunction`

| `rlVectorQValueFunction`

| `rlContinuousDeterministicActor`

| `rlDiscreteCategoricalActor`

| `rlContinuousGaussianActor`

| `rlContinuousDeterministicTransitionFunction`

| `rlContinuousGaussianTransitionFunction`

| `rlContinuousDeterministicRewardFunction`

| `rlContinuousGaussianRewardFunction`

| `rlIsDoneFunction`

| `accelerate`

| `gradient`

| `predict`

## Apri esempio

Si dispone di una versione modificata di questo esempio. Desideri aprire questo esempio con le tue modifiche?

## Comando MATLAB

Hai fatto clic su un collegamento che corrisponde a questo comando MATLAB:

Esegui il comando inserendolo nella finestra di comando MATLAB. I browser web non supportano i comandi MATLAB.

# Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)