Independently working multiple reinforcement learning agents

15 visualizzazioni (ultimi 30 giorni)
Hello everybody, I am using two TD3 RL agents for tracking two different references. However, I recieved the following result of the reward plot. As you can see, when one of the agent works properly the other works very bad and vice verca.
here you can find the code:
  • oInfo1 = rlNumericSpec([3,1]);
  • oInfo2 = rlNumericSpec([3,1]);
  • oInfo.Name = 'observations';
  • numObservations = oInfo1.Dimension(1);
  • act1 = rlNumericSpec([3,1]);
  • act2 = rlNumericSpec([3,1]);
  • numActions = act1.Dimension(1);
  • obsInfo = {oInfo1,oInfo2};
  • actInfo = {act1,act2};
  • agentblk =["PV/Control_rll/Agent A", "PV/Control_rll/Agent B"];
  • env = rlSimulinkEnv(mdl,agentblk,obsInfo,actInfo);
  • Ts = 1e-2;
  • statePath = [
  • featureInputLayer(numObservations,'Normalization','none','Name','State')
  • fullyConnectedLayer(64,'Name','CriticStateFC1')
  • reluLayer('Name','CriticRelu1')
  • fullyConnectedLayer(32,'Name','CriticStateFC2')];
  • actionPath = [
  • featureInputLayer(numActions,'Normalization','none','Name','Action')
  • fullyConnectedLayer(32,'Name','CriticActionFC1')];
  • commonPath = [
  • additionLayer(2,'Name','add')
  • reluLayer('Name','CriticCommonRelu')
  • fullyConnectedLayer(32, 'Name','fc3')
  • reluLayer('Name','relu3')
  • fullyConnectedLayer(16, 'Name','fc4')
  • fullyConnectedLayer(1,'Name','CriticOutput')];
  • criticNetwork = layerGraph();
  • criticNetwork = addLayers(criticNetwork,statePath);
  • criticNetwork = addLayers(criticNetwork,actionPath);
  • criticNetwork = addLayers(criticNetwork,commonPath);
  • criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');
  • criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');
  • criticOpts = rlRepresentationOptions('LearnRate',1e-02,'GradientThreshold',1);
  • criticA = rlQValueRepresentation(criticNetwork,oInfo1,act1,'Observation',{'State'},'Action',{'Action'},criticOpts);
  • criticB = rlQValueRepresentation(criticNetwork,oInfo2,act2,'Observation',{'State'},'Action',{'Action'},criticOpts);
  • actorNetwork = [
  • featureInputLayer(numObservations,'Normalization','none','Name','State')
  • fullyConnectedLayer(64, 'Name','actorFC1')
  • tanhLayer('Name','actorTanh1')
  • fullyConnectedLayer(32, 'Name','actorFC2')
  • tanhLayer('Name','actorTanh2')
  • fullyConnectedLayer(numActions,'Name','Action')
  • ];
  • actorOptions = rlRepresentationOptions('LearnRate',1e-02,'GradientThreshold',1);
  • actorA = rlDeterministicActorRepresentation(actorNetwork,oInfo1,act1,'Observation',{'State'},'Action',{'Action'},actorOptions);
  • actorB = rlDeterministicActorRepresentation(actorNetwork,oInfo2,act2,'Observation',{'State'},'Action',{'Action'},actorOptions);
  • agentOpts = rlTD3AgentOptions(...
  • 'SampleTime',Ts,...
  • 'TargetSmoothFactor',1e-3,...
  • 'DiscountFactor',.997, ...
  • 'MiniBatchSize',64, ...
  • 'ExperienceBufferLength',1e6);
  • agentA = rlTD3Agent(actorA,criticA,agentOpts);
  • agentB = rlTD3Agent(actorB,criticB,agentOpts)
  • maxsteps = ceil(6/Ts);
  • trainOpts = rlTrainingOptions(...
  • 'MaxEpisodes',5000,...
  • 'MaxStepsPerEpisode',maxsteps,...
  • 'ScoreAveragingWindowLength',20, ...
  • 'Verbose',true, ...
I know since R2020b, the agent neural networks are updated independently. However, I can see here that Since R2022a, Learning strategy for each agent group (specified as either "decentralized" or "centralized") could be selected, where I can use decentralized training, that agents collect their own set of experiences during the episodes and learn independently from other agents.
Now my question is that: Do I need to use R2022a or my problem is in envirenment difination?

Risposta accettata

Emmanouil Tzorakoleftherakis
Centralized learning makes learning and exploration more efficient because the agents share things like experiences. If agents perform similar/collaborative tasks this could speed up training. If the tasks are inherently different, you should probably go with decentralized learning.
That said, training multiple agents simultaneously is challenging because the environment violates the markov assumption. To help with that you should make sure to share as much info between agents as possible. At they very minimum, the actions of one agent should be observations of the other and vice versa.
  1 Commento
Esan freedom
Esan freedom il 10 Apr 2023
Thank you so much, dear Emmanouil. Sorry for the delay, I was trying to solve my problem based on your suggestions. However, if I understood correctly, using external actions during training was my problem. In the way that when Agent A is working, Agent B uses external actions [0 0 0] and vice versa. Then I saw always only one of the agents works appropriately. So, I had to train them separately and when training finished then the two saved agents were used in a single Simulink together and in this condition external actions were used.
I am grateful for your help dear Emmanouil as always,
Regards

Accedi per commentare.

Più risposte (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by