Reinforcement Learning . Sudden very high Rewards during training of RL model.
Mostra commenti meno recenti
sir during the training i get sudden very high rewards of order 10e16 (shown in image attached) and i am unable to figure out what is causing this. here is the code i am using and i am also attaching the simulink model.
Tf = 10;
Ts = 0.1;
mdl = 'rl_exam2'
obsInfo = rlNumericSpec([3 1]);
obsInfo.Name = 'observations';
obsInfo.Description = 'integrated error, error, Response';
numObservations = obsInfo.Dimension(1)
actInfo = rlNumericSpec([1 1],'LowerLimit',0,'UpperLimit',1);
actInfo.Name = 'Control Input';
numActions = actInfo.Dimension(1);
%% To Create Environment
env = rlSimulinkEnv(mdl,[mdl '/RL Agent'],obsInfo,actInfo);
%%
rng(0)
%%
%% To Create Critic Network
statePath = [
imageInputLayer([numObservations 1 1],'Normalization','none','Name','State')
fullyConnectedLayer(50,'Name','CriticStateFC1')
reluLayer('Name','CriticRelu1')
fullyConnectedLayer(40,'Name','CriticStateFC2')];
actionPath = [
imageInputLayer([numActions 1 1],'Normalization','none','Name','Action')
fullyConnectedLayer(40,'Name','CriticActionFC1')];
commonPath = [
additionLayer(2,'Name','add')
reluLayer('Name','CriticCommonRelu')
fullyConnectedLayer(1,'Name','CriticOutput')];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');
criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');
criticOpts = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,'Observation',{'State'},'Action',{'Action'},criticOpts);
actorNetwork = [
imageInputLayer([numObservations 1 1],'Normalization','none','Name','State')
fullyConnectedLayer(40,'Name','actorFC1')
reluLayer('Name','ActorRelu1')
fullyConnectedLayer(numActions,'Name','actorFC2')
tanhLayer('Name','actorTanh')
scalingLayer('Name','Action','Scale',0.5,'Bias',0.5)
];
actorOptions = rlRepresentationOptions('LearnRate',1e-04,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,'Observation',{'State'},'Action',{'Action'},actorOptions);
%% To Create Agent
agentOpts = rlDDPGAgentOptions(...
'SampleTime',0.1,...
'TargetSmoothFactor',1e-3,...
'DiscountFactor',1,...
'ExperienceBufferLength',1e6,...
'MiniBatchSize',64,...
'ExperienceBufferLength',1e6);
agentOpts.NoiseOptions.Variance = 0.08;
agentOpts.NoiseOptions.VarianceDecayRate = 1e-5;
agent = rlDDPGAgent(actor,critic,agentOpts)
%% Training Options
maxepisodes = 3000;
maxsteps = ceil(Tf/Ts);
trainingOpts = rlTrainingOptions(...
'MaxEpisodes',maxepisodes,...
'MaxStepsPerEpisode',maxsteps,...
'ScoreAveragingWindowLength',20, ...
'Verbose',false,...
'Plots','training-progress',...
'StopTrainingCriteria','EpisodeCount',...
'StopTrainingValue',1500);
%% TO TRAIN
doTraining = true;
if doTraining
trainingStats = train(agent,env,trainingOpts);
% save('agent_new.mat','agent_ready') %%% to save agent ###
else
% Load pretrained agent for the example.
load('agent_old.mat','agent')
end
Risposta accettata
Più risposte (0)
Categorie
Scopri di più su Training and Simulation in Centro assistenza e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!