Adding Fixed Time Delay to RL Agent Observation Signal Without Impacting Simulation Performance

Question

Chris Morley il 3 Nov 2021

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/1577885-adding-fixed-time-delay-to-rl-agent-observation-signal-without-impacting-simulation-performance

Modificato: Chris Morley il 3 Nov 2021

I'm working on a RL model in Simulink where I would like to pass a [3x1] signal to the RL agent as well as the values of that signal from 4 previous moments in time seperated by increments of 0.1s. I should note that the 4 previous moments in time do not correspond to the step size of the simulation as the model also it uses a Variable-Step solver.

The motivation for doing this is to provide some temporal information to the agent. I'm aware that using an RNN within my agent could help with capturing temporal information but at this time I am avoiding using a RNN because I read that doing so will not allow me to generate code from the agent, which I will need at a later point in time.

The main issue I am having is that introducing these time delayed signals has caused the training of my RL agent to slow to a snail's pace. Reading this article from 2015 Why you should never break a continuous algebraic loop with a Memory block, I know to avoid using memory blocks or unit delays as these can cause the solver to reset at each time step. What I thought would be a good work around is using the Transport Delay block however this too causes the training to take place for to slowly. I should also mention that I have tried using Unit Delay Blocks in place of the Transport Delay Block but this yeilds a similar result.

If I run the model with a conventional controller in place of the RL Agent block with the observations connected to the controller (and just terminate the reward & IsDone Signals) the simulation runs at a good speed without any issues. Additionally if I run the training with the RL Agent in place, and comment out the transport delay blocks so that the same [3x1] signal is concatenated together 5 times to form the observation signal, the simulation also runs at a good pace. With all this being said I am quite confident that the issue has something to do with a transport delay block driving part of the observation signal for the RL Agent.

Is anyone aware of any better alternatives for me to implements some sort of "memory" in my observation signal of my RL Agent without drastically impacting simulation time?

The part of my model which 'assembles' the observation signal with measurements from previous points in time can be seen below. Note that the subsystem in the top left contains three transfer functions to break the algebraic loop in the observation signal. The subsystems positioned below each transport delay are there to provide the initial condition before the transport delay block begins outputing the delayed input signal. Using the initial condition parameter of the block does not work for me because the value changes at the begining of each episode when the environment ResetFcn() is called and having a changing variable in the parameter field is not allowed.