Episode Q0 increases exponentially
4 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Can anyone explain why episode Q0 in RL increases exponentially after convergence of reward to a suboptimal policy?

0 Commenti
Risposte (1)
Emmanouil Tzorakoleftherakis
il 16 Feb 2021
Hello,
Please take a look at this answer for some suggestions. Normalizing observations, rewards, and actions can also help avoid situations like these.
Hope this helps
1 Commento
Vedere anche
Categorie
Scopri di più su Introduction to Installation and Licensing in Help Center e File Exchange
Prodotti
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!