How to add stability constraint (eigenvalues of closed loop system A-BK < 0) in DDPG-based LQR controller

Question

Muhammad Nadeem il 2 Nov 2023

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/2042076-how-to-add-stability-constraint-eigenvalues-of-closed-loop-system-a-bk-0-in-ddpg-based-lqr-contr

Commentato: Muhammad Nadeem il 24 Nov 2023

Hello Everyone,

I am trying to train an agent for LQR type control. The system is in state space formate as: Ax+Bu, I wanted to know, is there any way I can add the stability constraint during training?. Meaning I wanted to ensure that the feedback "K" (the weights of the actor network) the actor compute, for that the eigenvalues of the closed loop system are always less then zero. My actor and critic models are given as follows:

%% Critic neural network

obsPath = featureInputLayer(obsInfo.Dimension(1),Name="obsIn");

actPath = featureInputLayer(actInfo.Dimension(1),Name="actIn");

commonPath = [

concatenationLayer(1,2,Name="concat")

quadraticLayer

fullyConnectedLayer(1,Name="value", ...

BiasLearnRateFactor=0,Bias=0)

];

% Add layers to layerGraph object

criticNet = layerGraph(obsPath);

criticNet = addLayers(criticNet,actPath);

criticNet = addLayers(criticNet,commonPath);

% Connect layers

criticNet = connectLayers(criticNet,"obsIn","concat/in1");

criticNet = connectLayers(criticNet,"actIn","concat/in2");

criticNet = dlnetwork(criticNet);

critic = rlQValueFunction(criticNet, ...

obsInfo,actInfo, ...

ObservationInputNames="obsIn",ActionInputNames="actIn");

%% Actor neural network

Biass = zeros(actInfo.Dimension(1),1); % no biasing linear actor

actorNet = [

featureInputLayer(obsInfo.Dimension(1))

fullyConnectedLayer(actInfo.Dimension(1), ...

BiasLearnRateFactor=0,Bias=Biass)

];

actorNet = dlnetwork(actorNet);

actor = rlContinuousDeterministicActor(actorNet,obsInfo,actInfo);

getAction(actor,{rand(obsInfo.Dimension)})

Any suggestions would be much appreciated.

Thanks,

Nadeem

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Kartik Saxena il 23 Nov 2023

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/2042076-how-to-add-stability-constraint-eigenvalues-of-closed-loop-system-a-bk-0-in-ddpg-based-lqr-contr#answer_1358557

Hi,

I understand you want to ensure that the stability of a linear system by constraining the 'eigen' values of the closed-loop system (A - BK) have negative real parts.

There are some approaches you can take to encourage the stability of the learned control policy:

1. Reward Shaping

Modify the reward function to penalize unstable behavior. This can be done by detecting when the system is becoming unstable (e.g., by monitoring the magnitude of the states or their growth rate) and assigning a large negative reward.

2. Custom Training Loop

Implement a custom training loop where you can check the eigenvalues of (A - BK) after each update of the actor network. If the updated policy results in an unstable system, you can revert to the previous policy or apply a penalty to the reward.

3. Projection Layer

Add a custom layer to the actor network that projects the output (the control matrix K) onto the set of stabilizing controllers. This is a complex approach because the set of stabilizing controllers is not convex, but some approximate methods might be applied.

4. Lyapunov Functions (https://matlabexamples.wordpress.com/tag/lyapunov-stability/)

Use a Lyapunov function to ensure stability. You can design a neural network to approximate a Lyapunov function and train it alongside the actor to ensure that the learned policy decreases the Lyapunov function, implying stability.

5. Pre-Training with LQR

Pre-train the actor network to output the optimal LQR gain K before starting the reinforcement learning training. This provides a good starting point that is known to be stable, and subsequent training can fine-tune the policy from there.

6. Constrained Optimization

Use constrained optimization techniques during training to ensure that the policy updates satisfy the stability constraint. This is a more advanced approach and may require significant modification to the training algorithm.

Refer to the following MathWorks documentations link to know more about these:

I hope this resolves your issue.

4 Commenti
Mostra 2 commenti meno recentiNascondi 2 commenti meno recenti

Kartik Saxena il 24 Nov 2023

Apri in MATLAB Online

Hi @Muhammad Nadeem,

To integrate Lyapunov stability into the reward function of a reinforcement learning (RL) algorithm in MATLAB, you would need to define a Lyapunov function that is suitable for your system. The Lyapunov function should be positive definite and its derivative along the system trajectories should be negative definite.

Below is a conceptual example of how you might incorporate a Lyapunov function into the reward function. This example assumes you have a linear system and a quadratic Lyapunov function 'V(x) = x'Px', where 'P' is a positive definite matrix. The goal is to ensure that the eigenvalues of the closed-loop system matrix (A-BK) have negative real parts, where 'K' is the control gain matrix.

This code is not a complete solution but rather a starting point to illustrate how you might begin to integrate a Lyapunov function into your RL framework. You will likely need to refine the Lyapunov function and adjust the reward computation to fit your specific system and learning algorithm.

Refer to the following code snippet to get an idea of how this can be achieved:

function reward = lyapunovStabilityReward(x, u, A, B, K, P)
    % x: Current state vector
    % u: Control action (output of the actor network)
    % A, B: System matrices
    % K: Control gain matrix (reshaped from the actor's output)
    % P: Positive definite matrix for Lyapunov function
    % Closed-loop system matrix
    A_cl = A - B * K;
    
    % Check if the closed-loop system is stable
    eigVals = eig(A_cl);
    isStable = all(real(eigVals) < 0);
    % Compute the Lyapunov function value
    V = x' * P * x;
    
    % Compute the derivative of the Lyapunov function along the system trajectory
    V_dot = x' * (A_cl' * P + P * A_cl) * x;
    
    % Check if the derivative of the Lyapunov function is negative definite
    isLyapunovDecreasing = V_dot < 0;
    
    % Define the reward
    if isStable && isLyapunovDecreasing
        % Positive reward for stability and decreasing Lyapunov function
        reward = 100;
    else
        % Negative reward for instability or non-decreasing Lyapunov function
        reward = -1000;
    end
end

To use this reward function, you would need to call it at each step of your training process, passing in the current state 'x', the action 'u', the system matrices 'A' and 'B', and the control gain matrix 'K'. The positive definite matrix 'P' can be found by solving the Lyapunov equation for a given 'A' matrix:

% Assuming a stable A matrix for demonstration purposes
A = [-1 0; 0 -2];
Q = eye(size(A)); % Choose a positive definite Q matrix
P = lyap(A, Q); % Solve Lyapunov equation to find P

Please note that this code is meant to serve as a conceptual guide and will need to be adapted to fit into your specific RL training framework. The reward function must be integrated with the rest of your RL training loop, and you may need to adjust the parameters and the Lyapunov function to ensure that it is appropriate for your system. Additionally, the reward values (100 and -1000 in the example) are arbitrary and should be tuned based on the specifics of your problem and learning algorithm.

I hope this resolves your issue.

Muhammad Nadeem il 24 Nov 2023

Hello @Kartik Saxena

Thank you for the response. The main issue is actually how to get the 'K' (the actor network weights) in every step of the episode? as I cannot pass the 'agent' to the 'step' function and then use

actor = getActor(agent);

params = getLearnableParameters(actor);

and so on as this doesnt work.

Any idea how can I extract K in every episode step?

Thanks again,

Accedi per commentare.

How to add stability constraint (eigenvalues of closed loop system A-BK < 0) in DDPG-based LQR controller

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposte (1)

4 Commenti
Mostra 2 commenti meno recentiNascondi 2 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

How to add stability constraint (eigenvalues of closed loop system A-BK < 0) in DDPG-based LQR controller

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposte (1)

4 Commenti Mostra 2 commenti meno recentiNascondi 2 commenti meno recenti

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

4 Commenti
Mostra 2 commenti meno recentiNascondi 2 commenti meno recenti