I want to know how does the parameter "NumStepsToLookAhead" in rlDDPGAgentOptions from reinforcement learning toolboxof matlab 2019b works? Whether the look ahead is done on target networks? (like modification in critic objective, from {r+gamma*Qt - Q} to {r+ sum(gamma**i*Qt) -Q} Or the look ahead is done on reward sampling itself? ( like changing reward "r" from each sample to "r+gamma*r_t+gamma**2*r_t+1+... Any help is highly appreciated.

number of look ahead steps in DDPG Agent Options

5 visualizzazioni (ultimi 30 giorni)

ALOK RANJAN SWAIN il 21 Feb 2020

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/506744-number-of-look-ahead-steps-in-ddpg-agent-options

Commentato: Dingshan Sun il 1 Set 2022

I want to know how does the parameter "NumStepsToLookAhead" in rlDDPGAgentOptions from reinforcement learning toolboxof matlab 2019b works?

Whether the look ahead is done on target networks? (like modification in critic objective, from {r+gamma*Qt - Q} to {r+ sum(gamma**i*Qt) -Q}
Or the look ahead is done on reward sampling itself? ( like changing reward "r" from each sample to "r+gamma*r_t+gamma**2*r_t+1+...

Any help is highly appreciated.

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Risposte (1)

Anh Tran il 1 Mar 2020

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/506744-number-of-look-ahead-steps-in-ddpg-agent-options#answer_417996

I am not sure what does reward sampling mean. "NumStepsToLookAhead" in rlDDPGAgentOptions changes the critic's target values in step 5 of DDPG training algorithm.

Assume g is the discount factor, the critic target will be as followed

4 Commenti
Mostra 2 commenti meno recentiNascondi 2 commenti meno recenti

ALOK RANJAN SWAIN il 4 Mar 2020

Thanks for your help.??

Dingshan Sun il 1 Set 2022

Could you give a hint how R_t,R_t_1,,R_t+2,...,R_t+n-1 can be obtained in an online off-policy algorithm? Especially for DRL methods that use an experience replay?

Accedi per commentare.

Accedi per rispondere a questa domanda.

Categorie

Control Systems Reinforcement Learning Toolbox Environments

Scopri di più su Environments in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by