Reinforcement Learning Toolbox - When does algorithm train?

I am currently using the RL-Toolbox with a DQN-Agent built into a long-running process-simulation.
The maximum stepcount is currently 8000 steps per episode.
Unfortunately the documentation seems a little ambiguous to me, so here my question:
Doese the train-function of the RL-Toolbox train the agent at the end of an episode or during the episode when the step count exeeds the minibatch-size (like in the baseline algorithms)?
Thank you in advance.

 Risposta accettata

The implementation is based on the algorithm listed here.
Weights are being updated at each time step.

1 Commento

"For each training time step" - that was the line I was looking for (yet looking into the source code lead me to the same conclusion).
After double-checking the baseline-algorithms I found that they do it the same way.
Thank you for your time!

Accedi per commentare.

Più risposte (0)

Categorie

Scopri di più su Reinforcement Learning Toolbox in Centro assistenza e File Exchange

Prodotti

Release

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by