3D UAV Path Planning using Reinforcement Learning Agent(DDPG)

Question

Arsalan il 19 Feb 2025

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/2174240-3d-uav-path-planning-using-reinforcement-learning-agent-ddpg

Risposto: Darshak il 28 Apr 2025

So I have taken the 3D UAV obstacle avoidance example and implemeneted path planning using DDPG on it. My agent learns to take the shortest path by avoiding the obstacle but as soon as I define a reset function and spawning the location of the agent between 2 postions it fails to learn. The main issue is that during a single training it learns to reach the goal from one position and when it is spawned to another location it was trainined on it takes the same path as before. The training for a single position is trained in like 100 to 150 episodes whereas when I spawn to two locations it wont converge ever after 10000 episodes.

I'll share the training curve for the agent being spawned at two locations.( Episodes greater than equal to 150 represent that it has reached the goal in the shortest path)

Observation:

Lidar Ranges (40 total ranges with max range of 5)

distance to goal

derivative of distance to goal

(I also tried adding more things to my observation such as pitch,yaw,roll and etc but that ultimately made the agent unable to converge faster)

Actions:

Pitch and Roll( -pi/7 to pi/7)

Yaw (-pi to pi)

Reward Function

Pitch, Roll and Yaw have a very small penalty

Penalty for hitting Obstacle (Large)

Reaching Goal

if derivative of disatance to goal is less than 0 its absoulte value is given as reward with a gain( rewarding the agent to maximize it velocity in the direction of the Goal)

minor step penalty

Termination Criteria:

If hit obstacle

If reached goal

Create the agent options object

agentOptions = rlDDPGAgentOptions();

% specify the options

agentOptions.SampleTime = Ts; %sampling time of 0.025 and final time of 25

agentOptions.MiniBatchSize = 256;

agentOptions.ExperienceBufferLength = 1e6;

agentOptions.MaxMiniBatchPerEpoch = 200;

% optimizer options

agentOptions.ActorOptimizerOptions.LearnRate = 1e-3;

agentOptions.ActorOptimizerOptions.GradientThreshold = 1;

agentOptions.CriticOptimizerOptions.LearnRate = 1e-3;

agentOptions.CriticOptimizerOptions.GradientThreshold = 1;

% exploration options

agentOptions.NoiseOptions.StandardDeviation = 0.15;

agentOptions.NoiseOptions.MeanAttractionConstant = 0.15;

initOpts = rlAgentInitializationOptions(NumHiddenUnit=256);

UAVQuadrotor = rlDDPGAgent(obsInfo,actInfo,initOpts,agentOptions);

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Darshak il 28 Apr 2025

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/2174240-3d-uav-path-planning-using-reinforcement-learning-agent-ddpg#answer_1564347

Hello Arsalan,

This problem of not converging might be due to multiple reasons like overfitting, poor exploration strategy, single episode limits, reward function sensitivity, etc. It can be addressed using the following strategies:

Randomize Start Positions More Aggressively
Modify Reward Function: Add Directionality Reward: Reward the UAV if its heading (yaw) is aligned towards the goal. Exploration Bonus: Reward if UAV reaches unexplored states → promotes learning better coverage.
Prioritized Experience Replay (PER): Instead of sampling experiences randomly, sample important ones more often
Hybrid Approach (Supervised + RL): Phase 1: Pre-train with a few good trajectories using behavior cloning (supervised learning). Phase 2: Fine-tune using reinforcement learning to improve beyond the supervised examples.

The following documentation can be referred for the implementation of all of this: https://www.mathworks.com/help/releases/R2024b/reinforcement-learning/index.html

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

3D UAV Path Planning using Reinforcement Learning Agent(DDPG)

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposte (1)

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Vedere anche

Categorie

Tag

Community Treasure Hunt

3D UAV Path Planning using Reinforcement Learning Agent(DDPG)

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposte (1)

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Vedere anche

Categorie

Tag

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti