Reward in training manager higher than should be

11 views (last 30 days)
I am trying to train a reinfocement learning agent and I have the environment setup in simulink. I'm facing two issues:
1- The reward in the training manager appears to be much higher than it should be. As shown in the picture below, the scope connected to the reward signal shows a reward value of 1 which is correct. However, in the training manager it is 70 which is not correct.
2- After a number of episodes, the training stops and I get an error message:
Error using rl.env.AbstractEnv/simWithPolicy (line 82)
An error occurred while simulating "ADSTestBed" with the agent "falsifier_agent".
Error in rl.task.SeriesTrainTask/runImpl (line 33)
[varargout{1},varargout{2}] = simWithPolicy(this.Env,this.Agent,simOpts);
Error in rl.task.Task/run (line 21)
[varargout{1:nargout}] = runImpl(this);
Error in rl.task.TaskSpec/internal_run (line 166)
[varargout{1:nargout}] = run(task);
Error in rl.task.TaskSpec/runDirect (line 170)
[this.Outputs{1:getNumOutputs(this)}] = internal_run(this);
Error in rl.task.TaskSpec/runScalarTask (line 194)
Error in rl.task.TaskSpec/run (line 69)
Error in rl.train.SeriesTrainer/run (line 24)
Error in rl.train.TrainingManager/train (line 421)
Error in rl.train.TrainingManager/run (line 211)
Error in rl.agent.AbstractAgent/train (line 78)
TrainingStatistics = run(trainMgr);
Error in ADSTestBedScript (line 121)
trainingStats = train(falsifier_agent,env,trainOpts);
Caused by:
Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)
Invalid input argument type or size such as observation, reward, isdone or loggedSignals.
Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)
Unable to compute gradient from representation.
Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)
Error using 'backwardLoss' in Layer rl.layer.FcnLossLayer. The function threw an
error and could not be executed.
Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)
Number of elements must not change. Use [] as one of the size inputs to
automatically calculate the appropriate size for that dimension.
I should mention that I have another agent in the simulink model but that agent is not being trained.
Version 2020b
Any help is appreciated. Thanks

Accepted Answer

Mohammed Eleffendi
Mohammed Eleffendi on 18 Mar 2021
For the first issue, the reward in the training manager is the cumulative episode reward whereas the reward in the scope is a plot of the reward for every time step. So the reward in the training manager is correct there is no issue in here.
For the second issue, it turns out if you have 'UseDevice" set to 'gpu' you will encounter this error. Change it to 'cpu' and the error disappears. Support is exploring what is causing this issue.

More Answers (1)

Emmanouil Tzorakoleftherakis
Cannot be sure about the error, but it seems somewhere in your setup you are currently changing changing the number of parameters/inputs (check inputs to the RL Agent block).
For your first question, individual reward at each time step is different than the episode reward shown in the Episode Manager. The latter sums up the individual rewards over all time steps of an episode

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by