Main Content

getValue

Obtain estimated value from a critic given environment observations and actions

Description

Value Function Critic

example

value = getValue(valueFcnAppx,obs) evaluates the value function critic valueFcnAppx and returns the value corresponding to the observation obs. In this case, valueFcnAppx is an rlValueFunction approximator object.

Q-Value Function Critics

example

value = getValue(vqValueFcnAppx,obs) evaluates the discrete-action-space Q-value function critic vqValueFcnAppx and returns the vector value in which each element represents the estimated value given the state corresponding to the observation obs and the action corresponding to the element number of value. In this case, vqValueFcnAppx is an rlVectorQValueFunction approximator object.

example

value = getValue(qValueFcnAppx,obs,act) evaluates the Q-value function critic qValueFcnAppx and returns the returns the scalar value representing the value given the observation obs and action act. In this case, qValueFcnAppx is an rlQValueFunction approximator object.

Return Recurrent Neural Network State

[value,state] = getValue(___) also returns the updated state of the critic object when it contains a recurrent neural network.

Examples

collapse all

Create an observation specification object (or alternatively use getObservationInfo to extract the specification object from an environment). For this example, define the observation space as a continuous four-dimensional space, so that a single observation is a column vector containing 4 doubles.

obsInfo = rlNumericSpec([4 1]);

Create a neural network to approximate the value function within the critic. The value is a scalar representing the expected cumulative long-term reward when the agent starts from the given observation and executes the best action.

net = [ featureInputLayer(4,'Normalization','none') ...
        fullyConnectedLayer(1,'Name','value')];

Convert the network to a dlnetwork object.

dlnet = dlnetwork(net);

Create a critic using the network and the observation specification object. When you use this syntax the network input layer is automatically associated with the environment observation according to the dimension specifications in obsInfo.

critic = rlValueFunction(dlnet,obsInfo);

Obtain a value function estimate for a random single observation. Use an observation array with the same dimensions as the observation specification.

val = getValue(critic,{rand(4,1)})
val = single
    0.7904

You can also obtain value function estimates for a batch of observations. For example obtain value functions for a batch of 20 observations.

batchVal = getValue(critic,{rand(4,1,20)});
size(batchVal)
ans = 1×2

     1    20

valBatch contains one value function estimate for each observation in the batch.

Create observation and action specification objects (or alternatively use getObservationInfo and getActionInfo to extract the specification objects from an environment. For this example, define the observation space as a continuous four-dimensional space, so that a single observation is a column vector containing four doubles, and the action space as a finite set consisting of three possible values (named 7, 5, and 3 in this case).

obsInfo = rlNumericSpec([4 1]);
actInfo = rlFiniteSetSpec([7 5 3]);

Create a deep neural network approximator to approximate the Q-value function within the critic. The input of the network must accept a four-element vector, as defined by obsInfo. The output must be a single output layer having as many elements as the number of possible discrete actions (three in this case, as defined by actInfo). Convert the network to a dlnetwork object.

net = [featureInputLayer(4,'Normalization','none') 
       fullyConnectedLayer(3,'Name','value')];
net = dlnetwork(net);

Create the critic using the network, as well as the names of the observation and action specification objects. The network input layers are automatically associated with the components of the observation signals according to the dimension specifications in obsInfo.

critic = rlVectorQValueFunction(net,obsInfo,actInfo);

Use getValue to return the values of a random observation, using the current network weights.

v = getValue(critic,{rand(4,1)})
v = 3x1 single column vector

    0.7232
    0.8177
   -0.2212

v contains three value function estimates, one for each possible discrete action.

You can also obtain value function estimates for a batch of observations. For example, obtain value function estimates for a batch of 10 observations.

batchV = getValue(critic,{rand(4,1,10)});
size(batchV)
ans = 1×2

     3    10

batchV contains three value function estimates for each observation in the batch.

Create observation and action specification objects (or alternatively use getObservationInfo and getObservationInfo to extract the specification object from an environment). For this example, define the observation space as having two continuous channels, the first one carrying an 8 by 3 matrix, and the second one a continuous four-dimensional vector. The action specification is a continuous column vector containing 2 doubles.

obsInfo = [rlNumericSpec([8 3]), rlNumericSpec([4 1])];
actInfo = rlNumericSpec([2 1]);

Create a custom basis function and its initial weight matrix.

myBasisFcn = @(obsA,obsB,act) [...
    ones(30,1);
    obsA(:); obsB(:); act(:);
    obsA(:).^2; obsB(:).^2; act(:).^2;
    sin(obsA(:)); sin(obsB(:)); sin(act(:));
    cos(obsA(:)); cos(obsB(:)); cos(act(:))];
W0 = rand(150,1);

The output of the critic is the scalar W'*myBasisFcn(obs,act), representing the Q-value function to be approximated.

Create the critic.

critic = rlQValueFunction({myBasisFcn,W0}, ...
    obsInfo,actInfo);

Use getValue to return the value of a random observation-action pair, using the current parameter matrix.

v = getValue(critic,{rand(8,3),(1:4)'},{rand(2,1)})
v = 68.8628

Create random a observation set of batch size 64 for each channel. The third dimension is the batch size, while the fourth is the sequence length for any recurrent neural network used by the critic (in this case not used).

batchobs_ch1 = rand(8,3,64,1);
batchobs_ch2 = rand(4,1,64,1);

Create a random action set of batch size 64.

batchact = rand(2,1,64,1);

Obtain the state-action value function estimate for the batch of observations and actions.

bv = getValue(critic,{batchobs_ch1,batchobs_ch2},{batchact});
size(bv)
ans = 1×2

     1    64

bv(23)
ans = 46.6310

Input Arguments

collapse all

Value function critic, specified as an rlValueFunction approximator object.

Vector Q-value function critic, specified as an rlVectorQValueFunction approximator object.

Q-value function critic, specified as an rlQValueFunction object.

Observations, specified as a cell array with as many elements as there are observation input channels. Each element of obs contains an array of observations for a single observation input channel.

The dimensions of each element in obs are MO-by-LB-by-LS, where:

  • MO corresponds to the dimensions of the associated observation input channel.

  • LB is the batch size. To specify a single observation, set LB = 1. To specify a batch of observations, specify LB > 1. If the critic object given as first input argument has multiple observation input channels, then LB must be the same for all elements of obs.

  • LS specifies the sequence length for a recurrent neural network. If the critic object given as first input argument does not use a recurrent neural network, then LS = 1. If the critic object has multiple observation input channels, then LS must be the same for all elements of obs.

LB and LS must be the same for both act and obs.

For more information on input and output formats for recurrent neural networks, see the Algorithms section of lstmLayer.

Action, specified as a single-element cell array that contains an array of action values.

The dimensions of this array are MA-by-LB-by-LS, where:

  • MA corresponds to the dimensions of the associated action specification.

  • LB is the batch size. To specify a single observation, set LB = 1. To specify a batch of observations, specify LB > 1.

  • LS specifies the sequence length for a recurrent neural network. If the critic object given as a first input argument does not use a recurrent neural network, then LS = 1.

LB and LS must be the same for both act and obs.

For more information on input and output formats for recurrent neural networks, see the Algorithms section of lstmLayer.

Output Arguments

collapse all

Estimated value function, returned as array with dimensions N-by-LB-by-LS, where:

  • N is the number of outputs of the critic network.

    • For a state value critics (valueFcnAppx), N = 1.

    • For a single-output state-action value function critics (qValueFcnAppx), N = 1.

    • For a multi-output state-action value function critics (vqValueFcnAppx), N is the number of discrete actions.

  • LB is the batch size.

  • LS is the sequence length for a recurrent neural network.

Updated state of the critic, returned as a cell array. If the critic does not use a recurrent neural network, then state is an empty cell array.

You can set the state of the critic to state using the setState function. For example:

valueFcnAppx = setState(valueFcnAppx,state);

Tips

The more general function evaluate behaves, for critic objects, similarly to getValue except that evaluate returns results inside a single-cell array.

para

Version History

Introduced in R2020a