can i decide the RL agents actions

I am training a PPO agent and issue is it keeps on searching for a better value even after reaching close to stable state.
what i mean is I want my agent to keep applying last action values as soon as the error values reaches <= 0.05 (to prevent oscillations and offset near the set point as shown in shared image.)
my question is can i do it in matlab because i know you can do it in python for sure. any help would be really really helpfull :)

3 Commenti

Could you please show the Python code? This will enable us to see how to make comparisons and implement that in MATLAB code.
actually i saw it in a IEEE paper and when i asked that guy he told me he was using python.
I dont have any code with me right now but surely there can be a way to decide the action of my agent i feel.
okay i might get some code after a week or so
but all i want is to limit the actions of my PPO agent to settle after some time, not act like as shown in image attached.

Accedi per commentare.

Risposte (2)

I believe that it has something to do with the StopTrainingCriteria and StopTrainingValue options of your rlTrainingOptions object. Is the condition "steady-state error ≤ 0.05" reflected in the training termination condition? Typically, the agent will continue to train until MaxEpisodes is reached when the stopping condition is not satisfied.
maxepisodes = 6000;
maxsteps = 150;
trainingOpts = rlTrainingOptions(...
'MaxEpisodes', maxepisodes,...
'MaxStepsPerEpisode', maxsteps,...
'ScoreAveragingWindowLength', 5, ...
'Verbose', false,...
'Plots', 'training-progress',...
'StopTrainingCriteria', 'AverageReward',...
'StopTrainingValue', 1500);
Also, please note that the rewards obtained by the final agents are not necessarily the greatest achieved during the training episodes. You need to save the agents that meet the "steady-state error ≤ 0.05" condition during training by specifying the SaveAgentCriteria and SaveAgentValue properties in the rlTrainingOptions object.
See also:

2 Commenti

then y r DDPG and TD3 agents working fine?
it has nothing to do with stop training criteria. i just want to settle my agent outputs to previous value as soon as error value reaches 0.05 in training episode.

Accedi per commentare.

Emmanouil Tzorakoleftherakis
Modificato: Emmanouil Tzorakoleftherakis il 25 Set 2023

0 voti

It seems like the paper you saw uses some logic to implement the behavior you mention. You could do the same with an if statement in MATLAB.

1 Commento

you mean in my script or in my environment.
like can u give an example

Accedi per commentare.

Categorie

Scopri di più su Reinforcement Learning Toolbox in Centro assistenza e File Exchange

Richiesto:

il 2 Set 2023

Commentato:

il 28 Ott 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by