Creating an actorLossFunction for Continuous​Determinis​ticActor

6 visualizzazioni (ultimi 30 giorni)
Hi in the example the actor loss function is the following for a rlDiscreteCategoricalActor
function loss = actorLossFunction(policy, lossData)
policy = policy{1};
% Create the action indication matrix.
batchSize = lossData.batchSize;
Z = repmat(lossData.actInfo.Elements',1,batchSize);
actionIndicationMatrix = lossData.actionBatch(:,:) == Z;
% Resize the discounted return to the size of policy.
G = actionIndicationMatrix .* lossData.discountedReturn;
G = reshape(G,size(policy));
% Round any policy values less than eps to eps.
policy(policy < eps) = eps;
% Compute the loss.
loss = -sum(G .* log(policy),'all');
end
Here is my
actInfo =
rlNumericSpec with properties:
LowerLimit: [2×1 double]
UpperLimit: [2×1 double]
Name: "CartPole Action"
Description: [0×0 string]
Dimension: [2 1]
DataType: "double"
obsInfo =
rlNumericSpec with properties:
LowerLimit: -Inf
UpperLimit: Inf
Name: "CartPole States"
Description: "pendulum_force, cart position, cart velocity"
Dimension: [4 1501]
DataType: "double"
Here is how I set my actor
actor = rlContinuousDeterministicActor(actorNet,obsInfo,actInfo);
actor = accelerate(actor,true);
actorOpts = rlOptimizerOptions('LearnRate',1e-3);
actorOptimizer = rlOptimizer(actorOpts);
To create my loss function can I do the following?
function loss = actorLossFunction(policy, lossData)
policy = policy{1};
% Create the action indication matrix.
batchSize = lossData.batchSize;
Z = repmat(lossData.actInfo.Dimension(1)',1,batchSize);
actionIndicationMatrix = lossData.actionBatch(:,:) == Z;
% Resize the discounted return to the size of policy.
G = actionIndicationMatrix .* lossData.discountedReturn;
G = reshape(G,size(policy));
% Round any policy values less than eps to eps.
policy(policy < eps) = eps;
% Compute the loss.
loss = -sum(G .* log(policy),'all');
end

Risposta accettata

Takeshi Takahashi
Takeshi Takahashi il 2 Giu 2022
Please take a look at this example for rlContinuousDeterministicActor if you want to use it in a custom training loop.
rlDiscreteCategoricalActor is for stochastic discrete actions while rlContinuousDeterministicActor is for deterministic continuous actions. You need different formulations.

Più risposte (0)

Categorie

Scopri di più su Policies and Value Functions in Help Center e File Exchange

Prodotti


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by