Reward in training manager higher than should be

Question

Mohammed Eleffendi il 10 Mar 2021

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/768027-reward-in-training-manager-higher-than-should-be

Commentato: Gaurav Shetty il 14 Set 2021

Risposta accettata: Mohammed Eleffendi

Apri in MATLAB Online

Hi,

I am trying to train a reinfocement learning agent and I have the environment setup in simulink. I'm facing two issues:

1- The reward in the training manager appears to be much higher than it should be. As shown in the picture below, the scope connected to the reward signal shows a reward value of 1 which is correct. However, in the training manager it is 70 which is not correct.

2- After a number of episodes, the training stops and I get an error message:

Error using rl.env.AbstractEnv/simWithPolicy (line 82)
An error occurred while simulating "ADSTestBed" with the agent "falsifier_agent".
Error in rl.task.SeriesTrainTask/runImpl (line 33)
            [varargout{1},varargout{2}] = simWithPolicy(this.Env,this.Agent,simOpts);
Error in rl.task.Task/run (line 21)
            [varargout{1:nargout}] = runImpl(this);
Error in rl.task.TaskSpec/internal_run (line 166)
            [varargout{1:nargout}] = run(task);
Error in rl.task.TaskSpec/runDirect (line 170)
            [this.Outputs{1:getNumOutputs(this)}] = internal_run(this);
Error in rl.task.TaskSpec/runScalarTask (line 194)
                runDirect(this);
Error in rl.task.TaskSpec/run (line 69)
                runScalarTask(task);
Error in rl.train.SeriesTrainer/run (line 24)
            run(seriestaskspec);
Error in rl.train.TrainingManager/train (line 421)
            run(trainer);
Error in rl.train.TrainingManager/run (line 211)
            train(this);
Error in rl.agent.AbstractAgent/train (line 78)
    TrainingStatistics = run(trainMgr);
Error in ADSTestBedScript (line 121)
trainingStats = train(falsifier_agent,env,trainOpts);
Caused by:
    Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)
    Invalid input argument type or size such as observation, reward, isdone or loggedSignals.
        Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)
        Unable to compute gradient from representation.
            Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)
            Error using 'backwardLoss' in Layer rl.layer.FcnLossLayer. The function threw an
            error and could not be executed.
                Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)
                Number of elements must not change. Use [] as one of the size inputs to
                automatically calculate the appropriate size for that dimension.

I should mention that I have another agent in the simulink model but that agent is not being trained.

Version 2020b

Any help is appreciated. Thanks

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Mohammed Eleffendi il 18 Mar 2021

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/768027-reward-in-training-manager-higher-than-should-be#answer_650882

For the first issue, the reward in the training manager is the cumulative episode reward whereas the reward in the scope is a plot of the reward for every time step. So the reward in the training manager is correct there is no issue in here.

For the second issue, it turns out if you have 'UseDevice" set to 'gpu' you will encounter this error. Change it to 'cpu' and the error disappears. Support is exploring what is causing this issue.

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Answer 2

Emmanouil Tzorakoleftherakis il 11 Mar 2021

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/768027-reward-in-training-manager-higher-than-should-be#answer_645267

Cannot be sure about the error, but it seems somewhere in your setup you are currently changing changing the number of parameters/inputs (check inputs to the RL Agent block).

For your first question, individual reward at each time step is different than the episode reward shown in the Episode Manager. The latter sums up the individual rewards over all time steps of an episode