Reinforcement learning DDPG Agent semi active control issue

Question

0 voti

Dear Matlab community,

i have implemented a reinforcement learning agent (DDPG) for controlling a semi-active suspension system in Simulink for my master thesis. The Simulink model is a half car model with two tires connected to a chassis body and the agent should control the variable dampers of the front and back axis. But every learning session with a huge number of episodes the DDPG Agent only learns a suboptimal control strategy. Mostly the results are the lowest possible damping ratio for the back axis and the maximum for the front axis with just tiny control adjustments (example in the picture).

Description of the Model:

13 continuous Observation
2 continuous Actions
Reward function with negative quadratic chassis and pitch acceleration
Resetfunction loads a pseudorandom road profile each episode
Damping ratio from 900 to 4300 Ns/m
Each episode last 10 seconds

I have tried with all these changes and the results are mostly the same:

‘NumHiddenUnit’ 25 and 256
Learn rate Actor = 1e-3 and 1e-4
With and without parallel computing
300, 1500 and 2000 episodes

My questions:

What is wrong with my agent that he only makes small control steps?
Is it possible, that my DDPG Agent doenst explore enough?

Sorry for my bad english and i thank you all for the help.

%% Agent creation
% Actionspace
actInfo = rlNumericSpec([2 1], ...
    'LowerLimit', hfmParam.dA.value(1), ...
    'UpperLimit', hfmParam.dA.value(2));
% Observationspace
obsInfo = rlNumericSpec([13 1], ...
    'LowerLimit', [-inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf 0]', ...
    'UpperLimit', [inf inf inf inf inf inf inf inf inf inf inf inf 40]');
%% Enviroment 
env = rlSimulinkEnv(mdl, agentBlock, obsInfo, actInfo);
env.ResetFcn = @(in)localResetFcn(in);
% Agent options
agentOpts = rlDDPGAgentOptions('SampleTime', tS);
knnOpts = rlAgentInitializationOptions('NumHiddenUnit', obsInfo.Dimension(1)*2-1);
% Agent 
agent = rlDDPGAgent(obsInfo, actInfo, knnOpts, agentOpts);
critic = getCritic(agent);
critic.Options.LearnRate = 1e-3;
agent = setCritic(agent, critic);
actor = getActor(agent);
actor.Options.LearnRate = 1e-4;
agent = setActor(agent, actor);

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Follow Question

Answer 1

Emmanouil Tzorakoleftherakis il 29 Mar 2021

1 voto

Hello,

This is very open-ended so there could be a lot of ways to improve your setup. My guess is that the issue is very relevant to the question you raise above. If the agent does not explore enough, all the other parameters you played with won't make much difference.

First, it is important to understand how exploration works for DDPG. Literally, whate happens is that we are adding noise sampled from a noise model to the deterministic policy output (step 1 here). If the parameters of the noise model are not tuned well, the noise added will be very small compared to your action range so the agent will not explore (which I suspect is what happends given that you do not tune the noise options in your code above).

Please take a look at this note in the doc. At a minimum, you should make sure that the variance of the noise model is between 1-10% of your action range. Then you can play with the variance decay rate. That should help you make some progress

5 Commenti
Mostra 3 commenti meno recenti Nascondi 3 commenti meno recenti

Emmanouil Tzorakoleftherakis il 5 Apr 2021

Setting the apprpriate noise params is a necessary step for a correct problem formulation - it does not guarantee succesful learning. If the agent actions during training make sense, i.e., if the agent is exploring values that make sense, the next thing to look at is your reward signal.

Maha Mosalam il 1 Dic 2021

hello

If I had very small values of the action range may be between 0.001 and -0.001 , how I can choose exploration , it actulally the action donot change values during steps, any help for that?

Accedi per commentare.

Reinforcement learning DDPG Agent semi active control issue

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Risposta accettata

5 Commenti
Mostra 3 commenti meno recenti Nascondi 3 commenti meno recenti

Più risposte (0)

Categorie

Prodotti

Release

Tag

Community Treasure Hunt

Reinforcement learning DDPG Agent semi active control issue

0 Commenti Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Risposta accettata

5 Commenti Mostra 3 commenti meno recenti Nascondi 3 commenti meno recenti

Più risposte (0)

Categorie

Prodotti

Release

Tag

Vedere anche

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

5 Commenti
Mostra 3 commenti meno recenti Nascondi 3 commenti meno recenti