PPO Agent - Initialization of actor and critic newtorks
Mostra commenti meno recenti
Whenever a PPO agent is initialized in Matlab, according to the documentation the parameters of both the actor and the critic are set randomly. However I know that this is not the only possible choice: other initialization schemes are possible (e.g. orthogonal initialization), and this can sometimes improve the future performance of the agent.
- Is there a reason why the random initialization has been chosen as the default method here?
- Is it possible to specify a different initialization method easily in the context of Reinforcement learning Toolbox, without starting from scratch?
Risposta accettata
Più risposte (0)
Categorie
Scopri di più su Reinforcement Learning in Centro assistenza e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!