Reinforcement Learning Toolbox - Change Action Space

Question

Federico Sello il 21 Lug 2019

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/472690-reinforcement-learning-toolbox-change-action-space

Commentato: Emmanouil Tzorakoleftherakis il 24 Lug 2019

Risposta accettata: Emmanouil Tzorakoleftherakis

(I'm using a DQN Agent in a custom template enviroment)

Is there a way to change the Action Space from which the action is choosen based on the current state during an episode?

For example let's say I have an agent that is moving in a room by choosing the directions of the motion, I would like that when he reaches the edge of the room in one direction he can no longer choose the direction that would eventually lead him off, thus reducing the Action Space.

Basically I want to reduce the Action Space to handle illegal moves.

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Emmanouil Tzorakoleftherakis il 23 Lug 2019

1
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/472690-reinforcement-learning-toolbox-change-action-space#answer_384421

Hi Federico,

Unfortunately, the action space is fixed once created. To reduce the amount of times an action is selected, you could penalize it in the reward signal if certain criteria are met.

I hope this helps.

2 Commenti
Mostra NessunoNascondi Nessuno

Federico Sello il 23 Lug 2019

Hi Emmanouil,

thank you for your answer, unfortunatly until now every attempt I tried to give a negative reward to the action that I didn't want the agent to do, didn't work. Eventually, after some initial time, the agent will still choose to perform that action. I don't really know how to explain this beheviour, I've tried changing the agent options, the training options, the reward function, the neural network architecture but nothing worked. But I suppose I should ask another question for that, anyway thanks again for the info.

Emmanouil Tzorakoleftherakis il 24 Lug 2019

In general, DQN has the tendency to choose more frequently optimistically estimated values due to maximization bias. Some additional things that may be helpful:

1) Make sure you are using double dqn (check the dqn agent options) to reduce overestimation

2) Play with the exploration settings. After exploration decays considerably, agent tends to choose what's best according to current values, which may not converge to true values. Decreasing the decay rate may help.

Accedi per commentare.

Reinforcement Learning Toolbox - Change Action Space

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

2 Commenti
Mostra NessunoNascondi Nessuno

Più risposte (0)

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

Reinforcement Learning Toolbox - Change Action Space

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

2 Commenti Mostra NessunoNascondi Nessuno

Più risposte (0)

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

2 Commenti
Mostra NessunoNascondi Nessuno