- Critic instability: Large Q-values (e.g., 0 - 70) indicate divergence.Reduce learning rates (Actor: 1e-4, Critic: 1e-3) and target smoothing factor τ ≈ 0.005.
- Unscaled rewards/observations: Very large negative rewards destabilize training. Normalize inputs/outputs or scale rewards - see Normalize Data in RL Agents.
- Exploration loss: Keep non-zero Gaussian noise; avoid early decay.
- Monitor critic loss and Q-values: Use the Episode Manager to confirm when instability starts.
“TD3 performs well, then abruptly saturates at bad actions (Simulink RL) — why?”
19 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Hello everyone,
I’m training a controller with TD3 because I want a deterministic policy. It performs well at first, but then it suddenly gets stuck in very poor actions with no warning. This seems odd—greedy policies shouldn’t behave like this(stuck a fixed trace, even it earned a very poor reward) . I’ve tried different exploration settings and input normalization, but the problem keeps returning.
Has anyone seen this on the MathWorks platform? What could cause TD3 to collapse like this, and how can I prevent it?
Episode: 1/200 | Episode reward: -150144.31 | Episode steps: 250 | Average reward: -150144.31 | Step Count: 250 | Episode Q0: -0.06
Episode: 2/200 | Episode reward: -142781.27 | Episode steps: 250 | Average reward: -146462.79 | Step Count: 500 | Episode Q0: -0.06
Episode: 3/200 | Episode reward: -146085.77 | Episode steps: 250 | Average reward: -146337.12 | Step Count: 750 | Episode Q0: -0.07
Episode: 4/200 | Episode reward: -127403.65 | Episode steps: 250 | Average reward: -141603.75 | Step Count: 1000 | Episode Q0: -0.14
Episode: 5/200 | Episode reward: -90967.66 | Episode steps: 250 | Average reward: -131476.53 | Step Count: 1250 | Episode Q0: -0.24
Episode: 6/200 | Episode reward: -58398.87 | Episode steps: 250 | Average reward: -119296.92 | Step Count: 1500 | Episode Q0: -0.28
Episode: 7/200 | Episode reward: -35903.63 | Episode steps: 250 | Average reward: -107383.59 | Step Count: 1750 | Episode Q0: -0.28
Episode: 8/200 | Episode reward: -10701.52 | Episode steps: 250 | Average reward: -95298.34 | Step Count: 2000 | Episode Q0: -0.06
Episode: 9/200 | Episode reward: -9437.55 | Episode steps: 250 | Average reward: -85758.25 | Step Count: 2250 | Episode Q0: 0.89
Episode: 10/200 | Episode reward: -17715.87 | Episode steps: 250 | Average reward: -78954.01 | Step Count: 2500 | Episode Q0: 2.43
Episode: 11/200 | Episode reward: -34624.05 | Episode steps: 250 | Average reward: -67401.98 | Step Count: 2750 | Episode Q0: 4.24
Episode: 12/200 | Episode reward: -40353.72 | Episode steps: 250 | Average reward: -57159.23 | Step Count: 3000 | Episode Q0: 6.50
Episode: 13/200 | Episode reward: -42417.75 | Episode steps: 250 | Average reward: -46792.43 | Step Count: 3250 | Episode Q0: 7.25
Episode: 14/200 | Episode reward: -43329.38 | Episode steps: 250 | Average reward: -38385.00 | Step Count: 3500 | Episode Q0: 8.65
Episode: 15/200 | Episode reward: -21137.36 | Episode steps: 250 | Average reward: -31401.97 | Step Count: 3750 | Episode Q0: 10.46
Episode: 16/200 | Episode reward: -20629.98 | Episode steps: 250 | Average reward: -27625.08 | Step Count: 4000 | Episode Q0: 12.07
Episode: 17/200 | Episode reward: -190383.39 | Episode steps: 250 | Average reward: -43073.06 | Step Count: 4250 | Episode Q0: 15.93
Episode: 18/200 | Episode reward: -188099.18 | Episode steps: 250 | Average reward: -60812.82 | Step Count: 4500 | Episode Q0: 16.85
Episode: 19/200 | Episode reward: -188479.67 | Episode steps: 250 | Average reward: -78717.04 | Step Count: 4750 | Episode Q0: 16.95
Episode: 20/200 | Episode reward: -189525.03 | Episode steps: 250 | Average reward: -95897.95 | Step Count: 5000 | Episode Q0: 18.68
Episode: 21/200 | Episode reward: -189286.51 | Episode steps: 250 | Average reward: -111364.20 | Step Count: 5250 | Episode Q0: 18.17
Episode: 22/200 | Episode reward: -190229.29 | Episode steps: 250 | Average reward: -126351.75 | Step Count: 5500 | Episode Q0: 19.29
Episode: 23/200 | Episode reward: -188722.45 | Episode steps: 250 | Average reward: -140982.22 | Step Count: 5750 | Episode Q0: 20.18
Episode: 24/200 | Episode reward: -189155.54 | Episode steps: 250 | Average reward: -155564.84 | Step Count: 6000 | Episode Q0: 21.27
Episode: 25/200 | Episode reward: -187477.81 | Episode steps: 250 | Average reward: -172198.88 | Step Count: 6250 | Episode Q0: 21.23
Episode: 26/200 | Episode reward: -187086.00 | Episode steps: 250 | Average reward: -188844.49 | Step Count: 6500 | Episode Q0: 23.44
Episode: 27/200 | Episode reward: -187086.00 | Episode steps: 250 | Average reward: -188514.75 | Step Count: 6750 | Episode Q0: 25.41
Episode: 28/200 | Episode reward: -187086.00 | Episode steps: 250 | Average reward: -188413.43 | Step Count: 7000 | Episode Q0: 33.53
Episode: 29/200 | Episode reward: -187086.00 | Episode steps: 250 | Average reward: -188274.06 | Step Count: 7250 | Episode Q0: 43.19
Episode: 30/200 | Episode reward: -187086.00 | Episode steps: 250 | Average reward: -188030.16 | Step Count: 7500 | Episode Q0: 48.98
Episode: 31/200 | Episode reward: -187086.00 | Episode steps: 250 | Average reward: -187810.11 | Step Count: 7750 | Episode Q0: 69.61
0 Commenti
Risposte (1)
Satyam
il 17 Ott 2025 alle 6:54
This TD3 “collapse” usually happens due to critic divergence or loss of exploration. You can try troubleshooting by following these few steps:
I hope it will fix your issue.
0 Commenti
Vedere anche
Categorie
Scopri di più su Training and Simulation in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!