Hi Leonardo,
Using an external action port in a Reinforcement Learning (RL) block can be a powerful method to facilitate the training of RL agents by leveraging an existing control system, such as your TECS (Total Energy Control System) algorithm. This approach can help guide the RL agent by providing it with actions that are known to be effective, potentially speeding up the learning process.
How Learning with External Actions Works:
When using external actions in the context of RL, the neural network can indeed update its weights and biases by observing the actions of the external agent. The idea is to use the external actions as a form of supervised learning signal, where the RL agent learns to mimic the external controller initially and then gradually takes over the control as it becomes more proficient.
Steps to Implement Learning with External Actions
- Parallel Execution: Run the TECS algorithm in parallel with the RL agent. The TECS algorithm provides the actions that are used as a reference for the RL agent.
- External Actions Input: Use the external action port of the RL block to feed the actions from the TECS algorithm into the RL agent. This allows the RL agent to observe both the state of the system and the actions taken by the TECS algorithm.
- Warm-up Phase: Start with the RL agent observing and learning from the TECS actions. During this phase, the agent tries to mimic the TECS actions as closely as possible.
- Gradual Transition: Gradually reduce the dependency on the TECS actions and allow the RL agent to take more control. This can be done by slowly decreasing the weight of the external actions in the loss function or by using an "on-off" approach where the external actions are turned off after a certain period.
On-Off Approach vs Continuous Injection
- Continuous Injection: Continuously feeding the TECS actions to the RL agent can provide a consistent learning signal. However, it might make it difficult for the agent to learn to act independently.
- On-Off Approach: Starting with external actions and then turning them off after a certain period can be effective. This allows the RL agent to learn from the TECS initially and then gradually take over control. This approach can help the agent transition from supervised learning to pure reinforcement learning.