Reinforcement Learning Toolbox: Discount factor issue

Question

EBRAHIM ALEBRAHIM il 23 Ago 2019

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/477341-reinforcement-learning-toolbox-discount-factor-issue

Commentato: Srivatsank il 28 Mag 2024

Hi,

I am trying to apply some RL algorithms in the RL toolbox such as ,the actor-critic algorithm, to a problem where the rewards for each step in an episode is discounted, though in the training manager window I see the episode reward as the cumulative reward rather than the discounted sum of rewards. I wonder if this is a bug as these seems confusing .

Thanks,

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Ajay Pattassery il 26 Ago 2019

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/477341-reinforcement-learning-toolbox-discount-factor-issue#answer_389078

Modificato: Ajay Pattassery il 26 Ago 2019

Apri in MATLAB Online

In the Episode Manager you could view the discounted sum of rewards for each episode named as Episode Reward. This should be the discounted sum of rewards over the time steps if you have set rlACAgentOptions to a discount factor as below.

opt = rlACAgentOptions('DiscountFactor',0.95)

If you are observing the reward on each episode is not the discounted sum of rewards, revert with env, critic, actor, trainOpts to reproduce the issue (Or the code you have used).

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Answer 2

EBRAHIM ALEBRAHIM il 26 Ago 2019

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/477341-reinforcement-learning-toolbox-discount-factor-issue#answer_389093

Hi Ajay,

I already have the discount factor set in the agent options as you mentioned and the problem still persist. I have a test simulator where the simulator returns a reward of 1 for each step in the episode and have set the maximum number of episodes in the training options to 500 as in my problem the episode never ends and hence 'IsDone' variable is always set to 0. If the episode reward in the training manager is supposed to be the discounted reward then it should be (1-0.95^500)0.05 =20. But the the training manager reports 500(the undiscounted sum of rewards).

Thanks

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Ajay Pattassery il 29 Ago 2019

Hello,

I have tried an Actor-Critic example by following the model given in the link. I can see the effect of the discount factor in the following example.

https://www.mathworks.com/help/reinforcement-learning/ref/rlacagent.html

Accedi per commentare.

Answer 3

EBRAHIM ALEBRAHIM il 29 Ago 2019

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/477341-reinforcement-learning-toolbox-discount-factor-issue#answer_389644

Apri in MATLAB Online

I would appreciate if you provide a screenshot of that because I deinitely don't see the effect of discounting even in the CartPole example. In that example, the epsiode reward that I get is basically the sum of rewards even though the discount rate is set below 1(that is 0.99). As you can see from the screenshot below the epsiode reward is 10 which is the sum of rewards of 15 sucessful balancing steps (each would give 1 unit of reward) and the last one is a failure which gives -5.

The discounted reward in this situation is supposed to be

The CartPole code that I have ran is below with the screenshot of the training (I set the Maximum number of training in the training options to 1) .

Thanks

clear
env = rlPredefinedEnv("CartPole-Discrete");
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
%Critic Network
criticNetwork = [
imageInputLayer([4 1 1],'Normalization','none','Name','state')
fullyConnectedLayer(1,'Name','CriticFC')];
criticOpts = rlRepresentationOptions('LearnRate',8e-3,'GradientThreshold',1);
critic = rlRepresentation(criticNetwork,obsInfo,'Observation',{'state'},criticOpts);
%Actor Network
actorNetwork = [
imageInputLayer([4 1 1],'Normalization','none','Name','state')
fullyConnectedLayer(2,'Name','action')];
actorOpts = rlRepresentationOptions('LearnRate',8e-3,'GradientThreshold',1);
actor = rlRepresentation(actorNetwork,obsInfo,actInfo,...
'Observation',{'state'},'Action',{'action'},actorOpts);
%Setting up agent
agentOpts = rlACAgentOptions(...
'NumStepsToLookAhead',32, ...
'DiscountFactor',0.99);
agent = rlACAgent(actor,critic,agentOpts);
%Train agent
rng(0)
trainOpts = rlTrainingOptions;
trainOpts.MaxEpisodes                = 1;
trainOpts.MaxStepsPerEpisode         = 500;
trainOpts.StopTrainingCriteria       = "AverageReward";
trainOpts.StopTrainingValue          = 500;
trainOpts.ScoreAveragingWindowLength = 5;
trainStats = train(agent,env,trainOpts)

2 Commenti
Mostra NessunoNascondi Nessuno

Ajay Pattassery il 5 Set 2019

The episode manager is showing the undiscounted cumulative reward from the environment. The discount factor, however, has an impact on training and hence the learned policy. You can observe the same by finding the average reward over a reasonable number of episodes with a discount factor closer to zero and with closer to one.

Srivatsank il 28 Mag 2024

Hey @Ajay Pattassery. Is it possible to change this display to discounted Reward? It would be helpful in debugging the reward functions that we are working with.

Accedi per commentare.

Reinforcement Learning Toolbox: Discount factor issue

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposte (3)

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

2 Commenti
Mostra NessunoNascondi Nessuno

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

Reinforcement Learning Toolbox: Discount factor issue

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposte (3)

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

2 Commenti Mostra NessunoNascondi Nessuno

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

2 Commenti
Mostra NessunoNascondi Nessuno