
number of look ahead steps in DDPG Agent Options
5 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
I want to know how does the parameter "NumStepsToLookAhead" in rlDDPGAgentOptions from reinforcement learning toolboxof matlab 2019b works?
- Whether the look ahead is done on target networks? (like modification in critic objective, from {r+gamma*Qt - Q} to {r+ sum(gamma**i*Qt) -Q}
- Or the look ahead is done on reward sampling itself? ( like changing reward "r" from each sample to "r+gamma*r_t+gamma**2*r_t+1+...
Any help is highly appreciated.
0 Commenti
Risposte (1)
Anh Tran
il 1 Mar 2020
I am not sure what does reward sampling mean. "NumStepsToLookAhead" in rlDDPGAgentOptions changes the critic's target values in step 5 of DDPG training algorithm.
Assume g is the discount factor, the critic target will be as followed

4 Commenti
Dingshan Sun
il 1 Set 2022
Could you give a hint how R_t,R_t_1,,R_t+2,...,R_t+n-1 can be obtained in an online off-policy algorithm? Especially for DRL methods that use an experience replay?
Vedere anche
Categorie
Scopri di più su Environments in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!