When using the reinforcement learning toolbox, when training the agent, there will be a problem that action is the boundary.

Question

泽宇 il 23 Apr 2024

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/2110821-when-using-the-reinforcement-learning-toolbox-when-training-the-agent-there-will-be-a-problem-that

Risposto: Sanju il 8 Mag 2024

我在用深matlab强化学习工具箱进行自定义环境智能体训练，在第一次训练时（未得到奖励时），智能体给出的action是action约束范围内的值，然而在第二次训练时（得到第一次训练的奖励后），智能体给出的action是action却是约束范围的边界值？并且从第二次训练到后面第n次的训练也是这样，这是为什么？我可以给您我的简易代码，您可以帮忙看一下问题出在哪里了吗？

I am using the deep matlab reinforcement learning toolbox to train a custom environment agent. In the first training (when no reward is received), the action given by the agent is a value within the action constraint range. However, in the second training (after receiving the first training reward), the action given by the agent is an action but the boundary value of the constraint range? And the same is true from the second training to the nth subsequent training. Why is this? I can give you my simple code and can you help me see where the problem is?

function[Observation,Reward,IsDone,NextState]=newgoushi(Action,State)
E=State;
%% 奖励
GT=1000*Action(1);
NextState=E-GT;
if GT-E<0.1
    Reward=0;
else
    Reward=-1;
end
IsDone=Reward>=0;
Observation=NextState ;
end

我的action是一个连续的，约束范围在0-12000之间，我的state也是一个连续的，约束范围在5000-10000之间

My action is continuous, and the constraint range is between 0-12000. My state is also continuous, and the constraint range is between 5000-10000.

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Sanju il 8 Mag 2024

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/2110821-when-using-the-reinforcement-learning-toolbox-when-training-the-agent-there-will-be-a-problem-that#answer_1454177

Apri in MATLAB Online

Based on the code you provided, it seems that the action values generated by the agent are within the constraint range in the first training because no reward is received. However, in subsequent trainings, the agent might explore actions beyond the constraint range due to the learning process. This behavior is expected as the agent tries to optimize its policy and maximize the reward.

To address this issue, you can consider implementing a mechanism to clip the agent's actions within the constraint range before applying them to the environment. This can be done using the min and max functions in MATLAB. Here's an example of how you can modify your code to clip the actions,

function [Observation, Reward, IsDone, NextState] = newgoushi(Action, State)
    E = State;
    % Clip the action within the constraint range
    Action = max(min(Action, 12000), 0);
    % Rest of your code...
    % Update NextState, Reward, and IsDone based on the clipped action
    % Return the updated values
    Observation = NextState;
end

By clipping the actions within the constraint range, you ensure that the agent's actions remain within the desired bounds throughout the training process.

Hope this helps!

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

When using the reinforcement learning toolbox, when training the agent, there will be a problem that action is the boundary.

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposte (1)

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

When using the reinforcement learning toolbox, when training the agent, there will be a problem that action is the boundary.

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposte (1)

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti