How to solve "Invalid input argument type or size such as observation, reward, isdone or loggedSignals."

Question

jihun Kim il 21 Apr 2023

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/1950833-how-to-solve-invalid-input-argument-type-or-size-such-as-observation-reward-isdone-or-loggeds

Commentato: jihun Kim il 22 Apr 2023

Using Carmaker for simulink, we are constructing a reinforcement learning environment

The following error is occurring

--------------------------------------------------------------------------------------------------------------------------

" 다음 사용 중 오류가 발생함: rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors

(667번 라인)

Invalid input argument type or size such as observation, reward, isdone or

loggedSignals.

다음 사용 중 오류가 발생함: rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors

(667번 라인)

Unable to compute gradient from representation.

다음 사용 중 오류가 발생함:

rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (667번 라인)

요소 개수는 변경되어서는 안 됩니다. 해당 차원에 대한 적절한 크기를 자동으로 계산하려면

크기 입력값 중 하나로 []을 사용하십시오. "

--------------------------------------------------------------------------------------------------------------------------

I checked image input part because I am using the plot image as the input value of the network

I checked the size of the image that goes into the input, but I checked that it remains at 128 * 128 * 3.

I think there have been a problem with creating a deep learning network.

mdl = 'generic';
open_system(mdl);
agentblk = [mdl '/CarMaker/VehicleControl/CreateBus VhclCtrl/RL Agent'];
% create the observation info
obsInfo = rlNumericSpec([128 128 3],'LowerLimit',-inf*ones(128,128,3),'UpperLimit',inf*ones(128,128,3),'DataType', 'uint8');
obsInfo.Name = 'observations';
obsInfo.Description = 'information on velocity error and ego velocity';
% action Info
actInfo = rlNumericSpec([1 1 1],'LowerLimit',-45,'UpperLimit',45);
actInfo.Name = 'SteeringAng';
% define environment
env = rlSimulinkEnv(mdl,agentblk,obsInfo,actInfo);
% 신경망의 계층을 담을 계층 그래프 변수
lgraph = layerGraph();
% Critic 학습을 위한 Action 불러오는 부분
ActPath = [
    imageInputLayer([1 1 1],"Name","action","Normalization","none")
    fullyConnectedLayer(300,"Name","fc3","BiasLearnRateFactor",0)];
lgraph = addLayers(lgraph,ActPath);
% Critic 학습을 위한 Image 불러오는 부분
StatePath = [
    imageInputLayer([128 128 3],"Name","LidarImage","Normalization","none")
    convolution2dLayer([3 3],32,"Name","conv1","Padding","same")
    reluLayer("Name","relu1")
    maxPooling2dLayer([3 3],"Name","maxpool1","Padding","same","Stride",[2 2])
    convolution2dLayer([3 3],32,"Name","conv2","Padding","same")
    reluLayer("Name","relu2")
    maxPooling2dLayer([3 3],"Name","maxpool2","Padding","same","Stride",[2 2])
    fullyConnectedLayer(400,"Name","fc1")
    reluLayer("Name","relu3")
    fullyConnectedLayer(300,"Name","fc2")];
lgraph = addLayers(lgraph,StatePath);
tempLayers = [
    additionLayer(2,"Name","add")
    reluLayer("Name","relu4")
    fullyConnectedLayer(1,"Name","fc4")];
lgraph = addLayers(lgraph,tempLayers);
lgraph = connectLayers(lgraph,"fc3","add/in2");
lgraph = connectLayers(lgraph,"fc2","add/in1");
% Assemble paths
criticNetwork = lgraph;
% Create critic representation
criticOptions = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);
criticOptions.UseDevice = 'gpu';
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
    'Observation',{'LidarImage'},'Action',{'action'},criticOptions);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
layers = [
    imageInputLayer([128 128 3],"Name","LidarImage","Normalization","none")
    convolution2dLayer([3 3],32,"Name","conv1","Padding","same")
    reluLayer("Name","relu1")
    maxPooling2dLayer([3 3],"Name","maxpool1","Padding","same","Stride",[2 2])
    convolution2dLayer([3 3],32,"Name","conv2","Padding","same")
    reluLayer("Name","relu2")
    maxPooling2dLayer([3 3],"Name","maxpool2","Padding","same","Stride",[2 2])
    fullyConnectedLayer(400,"Name","fc1")
    reluLayer("Name","relu3")
    fullyConnectedLayer(300,"Name","fc2")
    reluLayer("Name","relu4")
    fullyConnectedLayer(1,"Name","fc4")
    tanhLayer("Name","tanh")
    scalingLayer("Name","scale1","Scale",actInfo.UpperLimit)];
actorNetwork = layerGraph(layers);
actorOptions = rlRepresentationOptions('LearnRate',1e-04,'GradientThreshold',1);
actorOptions.UseDevice = 'gpu';
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,'Observation',{'LidarImage'},'Action',{'scale1'},actorOptions);
% rlDDPGAgentOptions Options
agentOptions = rlDDPGAgentOptions(...
    'SampleTime',0.01,...
    'TargetSmoothFactor',1e-3,...
    'ExperienceBufferLength',1e6,...
    'DiscountFactor',0.99,...
    'MiniBatchSize',128);
agentOptions.NoiseOptions.Variance = 0.6;
agentOptions.NoiseOptions.VarianceDecayRate = 1e-6;
agent = rlDDPGAgent(actor,critic,agentOptions);
% Train Agent
maxepisodes = 5000;
maxsteps = 400;
trainingOptions = rlTrainingOptions(...
    'MaxEpisodes',maxepisodes,...
    'MaxStepsPerEpisode',maxsteps,...
    'Plots','training-progress',...
    'StopTrainingCriteria','AverageReward',...
    'StopTrainingValue',1000);
% doTraining = true;
% if doTraining    
%     % Train the agent.
%     trainingStats = train(agent,env,trainingOptions);
% else
%     % Load pretrained agent for the example.
%     load('SimplePendulumWithImageDDPG.mat','agent')       
% end