- Action Bounds: Ensure that the action bounds are correctly defined. If the boundaries are too restrictive, the agent might struggle to learn effective actions.
- Normalization: Normalizing the inputs and outputs can significantly impact training stability. Consider normalizing both state and action values to a common range (e.g., [0, 1]).
- Custom Environment: Verify that your custom environment is correctly implemented. Double-check the reward function, state representation, and action space.
- Exploration Noise: TD3 relies on exploration noise to encourage exploration. Ensure that the noise level is appropriate during training.
TD3算法训练时动作总是输出边界值
30 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
我在使用TD3算法训练完成后,无论训练过程中奖励曲线是否收敛,动作总是输出边界值或者输出完全不正确。我的state的值在0-20000,动作边界在0-15000.是哪里出了问题,是自定义环境创建的不正确还是哪里?需要对输入输出进行归一化吗
0 Commenti
Risposte (1)
UDAYA PEDDIRAJU
il 14 Mar 2024
Hi 泽宇,
Regarding your issue with the TD3 algorithm where actions always output at boundary values regardless of whether the reward curve converges.
It’s essential to investigate a few potential factors:
you can refer to the documentation TD3: https://www.mathworks.com/help/reinforcement-learning/ug/td3-agents.html.
Vedere anche
Categorie
Scopri di più su Big Data Processing in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!