PPO convergence guarantee in RL toolbox

Question

Haochen il 8 Giu 2024

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/2126651-ppo-convergence-guarantee-in-rl-toolbox

Risposto: Karan Singh il 17 Giu 2024

Hi,

I am testing my environment using the PPO algorithm in RL toolbox, I recently viewed this paper: https://arxiv.org/abs/2012.01399 which listed some assumptions on the convergence guranteen of PPO, some of them are for the environment itself (like the transition kernel...) and some are for the functions and parameters of the algorithm (like the learning rate alpha, the update function h...)

I am not sure if the PPO algorithm in the RL toolbox satisfies the assumptions of the convergence for the functions and parameters of the algorithm, because I did not find any direct mentioning of convergence in the official mathwork website, so I wonder how the algorithm is designed such that convergence is being considered.

Do I need to look into the train() function to see how those parameters and functions are designed?

Thank you

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Karan Singh il 17 Giu 2024

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/2126651-ppo-convergence-guarantee-in-rl-toolbox#answer_1472756

Hi Haochen,

The Proximal Policy Optimization algorithm in MATLAB's Reinforcement Learning Toolbox is based on the foundational principles from the original PPO papers by Schulman et al. (2017), as referenced in the documentation (https://www.mathworks.com/help/reinforcement-learning/ug/proximal-policy-optimization-agents.html).

It's crafted to adhere to the core assumptions necessary for the algorithm's convergence under certain conditions. However, the success of PPO, like many RL algorithms, hinges on several factors, including hyperparameter settings, the environment's complexity, and the implementation details.

Regarding the source code, accessing the detailed internals of the implementation might not be possible.

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

PPO convergence guarantee in RL toolbox

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Più risposte (0)

Vedere anche

Categorie

Tag

Prodotti

Community Treasure Hunt

PPO convergence guarantee in RL toolbox

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Più risposte (0)

Vedere anche

Categorie

Tag

Prodotti

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti