Azzera filtri
Azzera filtri

Using Reinforcement Learning algorithm to optimize parameter(s) of a controller

25 visualizzazioni (ultimi 30 giorni)
Hello,
First of all, I'm relatively new to reinforcement learning. I have a project in which I need to utilize RL to fine tune one, if not many parameters of an already well built controller, in my case it's a discrete controller. I have came across many papers describing the use of RL specifically for control but not many on optimization which I have trouble in understanding the concept. Perhaps, can someone shed some knowledge on the use of RL for parameters fine tuning, like how is it different than the RL controller concept and how is it going to run in parallel with the controller. I'm more than happy if you can share with me the references, if any. Thanks!

Risposta accettata

Emmanouil Tzorakoleftherakis
Hi Hazwan,
The main difference between using RL for control vs parameter tuning is that in the first case the policy will directly output, e.g. torque, or whatever is your control input. In the latter case, the output of the policy would be parameter values, e.g., if you are trying to tune a PID, the policy would output 3 numbers, Kp, Ki and Kd. Obviously the observations/inputs to the policy as well as the reward would probably need to be different too.
To your question on how the latter could run in parallel with the controller, I can see two scenarios:
1) Using RL for finding static gains. In this case you train, you get the constant parameter values the RL policy finds, and then you discard the policy and adjust your controller gains with these numbers
2) Using RL for finding dynamic/observation-based parameters. This would be in some sense similar to gain scheduling and for this case you would run the policy in parallel with the controller. The idea would be the same(i.e. the policy would output parameter values) but it would do so all the time, thus updating the controller parameters dynamically based on observations.
Hope that helps.
  8 Commenti
HazwanDrK
HazwanDrK il 22 Lug 2020
Sorry I can't quite catch that. My double integrator works about the same way however instead from zero initial position with a step response at certain time step for state x. Do I still need to include position error for my observation though? Also another thing, for custom environment using function names, do I need to specify my own reward function in the step function? It was not being specifically brought up in the DDPG example since I reckoned it's predefined. Thanks!
Emmanouil Tzorakoleftherakis
I would say you need the error yes if you are planning on tracking a collection of step responses. Otherwise, you would be basically overfitting to a single reference value if that makes sense.
You don't need a separate function for the reward, it can be incorporated in the step function as shown in the link you mentioned.

Accedi per commentare.

Più risposte (1)

Mehrdad Moradi
Mehrdad Moradi il 26 Lug 2021
Modificato: Mehrdad Moradi il 26 Lug 2021
But how configure RL to find a static gains while action signal a a time series of different values? Is there any guideline about it?

Prodotti


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by