How can we analyse learning curves (of reinforcement learning agent training) to predict what is wrong with the design of our network/reward function?
Mostra commenti meno recenti
Hi everyone,
I'm trying to train a PPO (continuous) agent, and I have some general questions about reinforcement learning:
How can we analyse training curves to predict what is wrong with our design (agent type, network layers types and size and training parameters)?
More specifically, I'm training a controller agent that give the four values of my PID gain to be used (P, I, D and N). I'm trying to make it learn to follow the reference speed (image on the right). We can also see on the right, in yellow, the instant reward, that is set to 10 if the five last values of output speed are close to the reference speed (less than 10 rpm of error).
I have currently this result:

On this picture, the system perform pretty well.
Can I conclude from these curves
- that my training will not converge?
- that the number of learnables of my net is too small?
- that I didn't choose the good learning rate (if yes, is it too small or too big?)
- is my reward function unefficient?
- anything else?
- we cannot conclude anything?
I can provide any other informations if needed (just ask which one, please)! The size of my observation vector is [13, 1].
Thanks a lot in advance for your help!
Best regards,
Nicolas
Risposta accettata
Più risposte (0)
Categorie
Scopri di più su Reinforcement Learning Toolbox in Centro assistenza e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!