DDPG: Actor clips outputs to zero, thus, keeping exploration minimal
3 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
I'm training a DDPG agent from the Reinforcement Learning Toolbox to adjust a PI controller. Thus, the agent should output P and I. After some initial learning episodes (~ 10 to 50) with high values for both, P and I, both outputs decrease to zero.
This is followed by either of the two cases, switching from time to time:
- Both output values stay at zero. (marked green in the following picture)
- Output I stays at zero while P being a very low value. (marked purple in the following picture)

The actor is structured as follows:
featureInputLayer(20, 'Normalization', 'none', 'Name', 'state vector')
fullyConnectedLayer(20, 'Name', 'fc1')
reluLayer('Name', 'relu1')
fullyConnectedLayer(256, 'Name', 'fc2')
reluLayer('Name', 'relu2')
fullyConnectedLayer(2, 'Name', 'fc3')
tanhLayer('Name', 'output')];
The PI controller is used to control a transfer function while a timed disturbance occurs. The disturbance is always identical.
The used fitness function is the IAE-value of the speed error:

The reward then is calculated by this formula:
r = r1*(2*exp(r2*I/In)-r3) + p;
with r1, r2, r3 being constants; I is the DDPG agent's IAE value and In the IAE value of the reference system; and p being a punishment, that is capped to [-15, 0]:
p = -max(|n_ref-n_act|²) * p1;
What have I done so far:
- trying to recreate a paper's solution
- - agent should take action once per episode as the disturbance is detected
- - copied the transfer function, networks sizes, observation and all options (critic, actor, DDPG agent, training)
- - added a flexible punishment (for the system to not oscillate)
- adjusted the range of the punishment to the range of the reward
- set gradient threshold from 'inf' to '1'
- set lower and upper limit within actionInfo
- set standarddeviation to different values
- - currently being 0.1
- - while 1% of the action range may be 0.8943 and 10% corresponds to 8.943
- - with standarddeviation being 0.8943: I stays zero; P explores a bit after then staying on it's max value
Many thanks in advance!
// discreteSys_Script_05.m being the main script
0 Commenti
Risposte (0)
Vedere anche
Categorie
Scopri di più su Agents in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!