Difference RL Agent training plot and result plot

Question

sungho park il 17 Gen 2022

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/1630320-difference-rl-agent-training-plot-and-result-plot

Commentato: Emmanouil Tzorakoleftherakis il 24 Gen 2023

Hi, below the grahp shows the action during the training and second one shows different action after training. just constant..

can you please help me?

%% Create observation specification

obsInfo = rlNumericSpec([3 1]);

obsInfo.Name = 'observations';

numObs = obsInfo.Dimension(1);

%% Create action specification

actInfo = rlNumericSpec([1 1],'LowerLimit',-50,'UpperLimit',50);

%actInfo = rlNumericSpec([1 1]);

actInfo.Name = 'current';

numActions = actInfo.Dimension(1);

%% Create the environment

blk= [mdl '/RL Agent'];

env = rlSimulinkEnv(mdl,blk,obsInfo,actInfo);

env.ResetFcn= @(in)setVariable(in,'current',5,'Workspace',mdl);

env.UseFastRestart = 'off';

Ts= param.dt;

Tf= param.end_time;

rng(0)

%% Create DDPG Agent

statePath = [

featureInputLayer(numObs,'Normalization','none','Name','observations')

fullyConnectedLayer(200,'Name','CriticStateFC1')

reluLayer('Name', 'CriticRelu1')

fullyConnectedLayer(200,'Name','CriticStateFC2')];

actionPath = [

featureInputLayer(1,'Normalization','none','Name','action')

fullyConnectedLayer(200,'Name','CriticActionFC1','BiasLearnRateFactor',0)];

commonPath = [

additionLayer(2,'Name','add')

reluLayer('Name','CriticCommonRelu')

fullyConnectedLayer(1,'Name','CriticOutput')];

criticNetwork = layerGraph(statePath);

criticNetwork = addLayers(criticNetwork,actionPath);

criticNetwork = addLayers(criticNetwork,commonPath);

criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');

criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');

figure

plot(criticNetwork)

criticOpts = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);

%% Create the criticrepresentation using the specified deep neural

% network and options

critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,'Observation', ...

{'observations'},'Action',{'action'},criticOpts);

%% create the actor

actorNetwork = [

featureInputLayer(numObs,'Normalization','none','Name','observations')

fullyConnectedLayer(400,'Name','ctorFC1')

reluLayer('Name','ActorRelu1')

fullyConnectedLayer(300,'Name','ActorFC2')

reluLayer('Name','ActorRelu2')

fullyConnectedLayer(1,'Name','ActorFC3')

tanhLayer('Name','ActorTanh')

scalingLayer('Name','ActorScaling','Scale',max(actInfo.UpperLimit))

];

actorOpts = rlRepresentationOptions('LearnRate',1e-04,'GradientThreshold',1);

actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo, ...

'Observation',{'observations'},'Action',{'ActorScaling'},actorOpts);

%% Create the DDPG agent option

agentOpts = rlDDPGAgentOptions(...

'SampleTime',Ts,...

'TargetSmoothFactor',1e-3,...

'ExperienceBufferLength',1e6,...

'SaveExperienceBufferWithAgent',true,...

'DiscountFactor',0.99,...

"ResetExperienceBufferBeforeTraining",false,...

'MiniBatchSize',256);

agentOpts.NoiseOptions.Variance = 0.2;

agentOpts.NoiseOptions.VarianceDecayRate = 0;

agent = rlDDPGAgent(actor,critic,agentOpts);

%% Train Agent

maxepisodes = 10;

maxsteps = ceil(Tf/Ts);

trainOpts = rlTrainingOptions(...

'MaxEpisodes',maxepisodes,...

'MaxStepsPerEpisode',maxsteps,...

'ScoreAveragingWindowLength',5,...

'Verbose',true,...

'Plots','training-progress',...

'StopTrainingCriteria','AverageReward',...

'StopTrainingValue',5000,...

'SaveAgentCriteria','EpisodeReward',...

'SaveAgentValue',5000);

doTraining = true;

%end

trainingStats = train(agent,env,trainOpts);

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Emmanouil Tzorakoleftherakis il 24 Gen 2023

How long did you train for? If you only trained for 10 episodes, you should give it more time

Accedi per commentare.

Accedi per rispondere a questa domanda.

Difference RL Agent training plot and result plot

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Risposte (0)

Vedere anche

Categorie

Tag

Prodotti

Community Treasure Hunt

Difference RL Agent training plot and result plot

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Risposte (0)

Vedere anche

Categorie

Tag

Prodotti

Community Treasure Hunt

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti