getMaxQValue
Obtain maximum estimated value over all possible actions from a Q-value function critic with discrete action space, given environment observations
Syntax
Description
[
evaluates the discrete-action-space Q-value function critic maxQ,maxActionIndex] = getMaxQValue(QValueFcn,obs)QValueFcn
and returns the maximum estimated value over all possible actions maxQ,
with the corresponding action index maxActionIndex, given environment
observations obs.
[
also returns the updated state of maxQ,maxActionIndex,nextState] = getMaxQValue(___)QValueFcn when it contains a
recurrent neural network.
___ = getMaxQValue(___,UseForward=
allows you to explicitly call a forward pass when computing gradients.useForward)
Examples
Input Arguments
Output Arguments
Tips
When the elements of the cell array in inData are
dlarray objects, the elements of the cell array returned in
outData are also dlarray objects. This allows
getMaxQValue to be used with automatic differentiation.
Specifically, you can write a custom loss function that directly uses
getMaxQValue and dlgradient within
it, and then use the dlfeval and
dlaccelerate
functions with your custom loss function. For an example, see Train Reinforcement Learning Policy Using Custom Training Loop and Custom Training Loop with Simulink Action Noise.
Version History
Introduced in R2020a