Error appears when setting the multi-dimensional actions in Matlab Environment (Reinforcement Learning Toolbox)

1 visualizzazione (ultimi 30 giorni)
As shown in the following codes, three actions, whose ranges are [0.1 10], [0.1 10] and [0 pi], respectively, are set:
%% main.m
%% Observation
ObservationInfo = rlNumericSpec([7 1]);
ObservationInfo.Name = 'Obstacle Avoidance States';
ObservationInfo.Description = 'delta_x, delta_y, delta_z, delta_L, delta_V, pusi, theta';
%% Action
ActionInfo = rlNumericSpec([3 1],'LowerLimit',[0.1 0.1 0]','UpperLimit',[10 10 pi]');
ActionInfo.Name = 'Action';
%% Environment
env = rlFunctionEnv(ObservationInfo,ActionInfo,'myStepFunction','myResetFunction');
rng(0);
InitialObs = reset(env);
Then, the actions are assigned three variables in the function myStepFunction (Action, LoggedSignals) as follows:
function [NextObs,Reward,IsDone,LoggedSignals] = myStepFunction(Action,LoggedSignals)
para_rho1 = Action(1);
para_rho2 = Action(2);
para_theta = Action(3);
......
end
Run the main.m, it is normal.
However, when running the following instruction, an error appears:
step(env,10)
Index exceeds the number of array elements (1)
Error: myStepFunction (line 23)
para_rho2 = Action(2);
Why does the dimension of the action is changed as 1? How to address this error?
If I set the variables para_rho2 and para_theta as constants, and change the dimension of the action as [1 1] in rlNumericSpec, then the instruction step(env,10) can be normally executed.

Risposte (1)

Ryan Comeau
Ryan Comeau il 29 Mag 2020
Hello, so I've take a look at the rocket lander code environment which MATLAB gives as an example. What they do, it that every action is scaled between 0 and 1. The maximum values for the actions are then stored in your environment properties as a vector of values defining the min and max values. When we hop into the step function, the actions get scaled. I know this seems strange, and i'm not sure if it's the best approach but i'm not a veteran of RL. So, what you should do is the following:
%% main.m
%% Observation
ObservationInfo = rlNumericSpec([7 1]);
ObservationInfo.Name = 'Obstacle Avoidance States';
ObservationInfo.Description = 'delta_x, delta_y, delta_z, delta_L, delta_V, pusi, theta';
%% Action
ActionInfo = rlNumericSpec([3 1 1],'LowerLimit',0,'UpperLimit',1);
ActionInfo.Name = 'Action';
%stuff...
function [NextObs,Reward,IsDone,LoggedSignals] = myStepFunction(Action,LoggedSignals)
para_rho1 = Action(1).*env.borders(1); %so open rocket lander from MATLAB and take a look
para_rho2 = Action(2).*env.borders(2);
para_theta = Action(3).*env.borders(3);
......
end
Hope this helps
RC
  2 Commenti
wujianfa93
wujianfa93 il 1 Giu 2020
Modificato: wujianfa93 il 1 Giu 2020
Many thanks for your reply! But I think I have solved this problem. When I continued designing the corresponding DDPG training process and start training, I found that the whole programme can correctly run. So I guess the verification using the function step() may not be necessary.

Accedi per commentare.

Prodotti


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by