Reinforcement Learning agent converges to a suboptimal policy

Question

Jeehwan Lee il 13 Nov 2022

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/1849648-reinforcement-learning-agent-converges-to-a-suboptimal-policy

Risposto: Emmanouil Tzorakoleftherakis il 13 Feb 2023

Hello

I am trying to learn an multi-period optimal capacity planning problem. The system has 2 uncertainties that are stochastic, but Markovian and a third state which is the capacity. The benchmark is a single-period planning problem, which I have already solved with MINLP optimization.

I have tried many weeks with different agents, but so far I have not succeeded in getting the agent to learn correctly.

In the graph below (with actor critic) you can see that although it seems that learning takes place, the value is suboptimal (less than the single-period optimization value).

One of the uncertainties is demand. In theory, the agent should increase the capacity observing the demand as states. However at convergence, it does not properly do this.

Note that although I have defined the actions as discrete, not all actions are feasible. To compensate for this, I have clipped the actions as follows:

if TIME_P < DEPLOY_T

Action(Action>1-INS_CAP) = 1-INS_CAP; % If OPTION TO ABANDON is added, then [-CAP_UPPER+INS_CAP:5:CAP_UPPER-INS_CAP]

else

% t>DEPLOY_T

Action = 0;

end

Here, DEPLOY_T is the number of years the capacity planning actions can be exercised. The time-step continues to TERMINAL_P to account for more future cash flows.

I was wondering if anyone has any tips (@Emmanouil Tzorakoleftherakis 's answers on this forum has been particularly helpful, but no luck for me) or could possibly look at the code for me.

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Emmanouil Tzorakoleftherakis il 13 Feb 2023

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/1849648-reinforcement-learning-agent-converges-to-a-suboptimal-policy#answer_1170580

Hello,

In your question you mention a graph but it has not been attached?

It sounds like the agent you trained has converged to a suboptimal solution. If that's the case you probably need to tweak your reward a bit (make sure it is equivalent to your benchmark problem) and possibly make sure the agent is exploring throughout training. Starting simple with a DQN agent would help.The EpsilonDecay and EpsilonMin values are important for exploration (see here). You may also want to randomize the initial condition of your environment. That could help bypass the local solution you converged to.

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Reinforcement Learning agent converges to a suboptimal policy

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposte (1)

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

Reinforcement Learning agent converges to a suboptimal policy

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposte (1)

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti