Determine the reward value to stop training in RL agent
Mostra commenti meno recenti
I saw in example of using RL agent, this sentence:
- Stop training when the agent receives an average cumulative reward greater than -355 over 100 consecutive episodes. At this point, the agent can control the level of water in the tank.
how did he calculate the exact reward -355 over 100 episodes? Is there any tips could help know when to stop the training at specific point before get worst.
thank you advance
Risposta accettata
Più risposte (1)
Sam Chak
il 17 Ott 2022
0 voti
2 Commenti
H. M.
il 17 Ott 2022
Francisco Serra
il 14 Dic 2023
For example, imagine your are using a RL agent for a control problem. You can use a classic controller to have a reference and apply to it the same cost function you use in the RL Agent. Then you do some simulations with that controller, you see how it goes and then you have an idea of how your RL Agent should perform. However, if you don't have a working reference to guide yourself you have to do what @Emmanouil Tzorakoleftherakis said.
Categorie
Scopri di più su Reinforcement Learning in Centro assistenza e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
