Why is the DDPG episode rewards never change during the whole training process?

Question

Guoge Tan il 25 Mag 2020

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/532933-why-is-the-ddpg-episode-rewards-never-change-during-the-whole-training-process

Commentato: Shahriar il 29 Giu 2022

Risposta accettata: Emmanouil Tzorakoleftherakis

I'm training a DDPG agent using the Reinforcement Learning toolbox on MATLAB R2020a for a path planning problem. But as you can see, the DDPG episode rewards and average rewards never change during 5000 episodes. I used a simple neural networks with 20 neurons and three layers, the learning rate is set to 0.01, and the Gradient Threshold is 1. Then I try to set weights and bias for fully connected layers and change my reward function, but the result is the same.

I also saw at here that others have a similar problem. So any advice for my problem? Thank you.

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Shahriar il 29 Giu 2022

@Guoge Tan could you solve this issue? I have a similar situation.

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Emmanouil Tzorakoleftherakis il 26 Mag 2020

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/532933-why-is-the-ddpg-episode-rewards-never-change-during-the-whole-training-process#answer_439593

Looks like the scale between Q0 and episode reward is very different. Try unchecking "Show Episode Q0" to see of the episode reward changes. I would then simplify the critic network to make sure it outputs values in a similar scale as the episode reward.