When DDPG optimizes PID parameters, how do I keep the first 10s of data from the system stabilization phase out of the experienceBuffer?

2 visualizzazioni (ultimi 30 giorni)
Adaptive PID control using simulink's own Agent. Since the first 20 is a buffer process for the system, the first 20s are not part of the transition process, but are necessary to exist. How to make the Agent block the first 20s of action, state, reward and other information, or how to make the first 20s does not affect the training effect. In fact, if the first 10s are learned by the intelligent body, then the training effect is very poor.

Risposte (1)

MULI
MULI il 25 Mar 2025
Hu @Yu,
To prevent the first 20 seconds from affecting your reinforcement learning agent in Simulink you can try the below approach:
  • Modify the Reward: Initially set the reward to zero for the first 20 seconds using a "Clock" block to track time, ensuring no learning occurs during this period.
  • Skip Initial States: Use a "Switch" block controlled by a "Relational Operator" to ignore actions and states for the first 20 seconds, allowing the agent to interact only after this buffer period.
  • Custom Reset: Begin each training episode from the state at 20 seconds. This approach skips the initial buffer and ensures the agent concentrates on the interactions that matter.
These steps will help your agent focus on necessary parts of the simulation, further improving training efficiency.

Prodotti


Release

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by