3D UAV Path Planning using Reinforcement Learning Agent(DDPG)

25 visualizzazioni (ultimi 30 giorni)
So I have taken the 3D UAV obstacle avoidance example and implemeneted path planning using DDPG on it. My agent learns to take the shortest path by avoiding the obstacle but as soon as I define a reset function and spawning the location of the agent between 2 postions it fails to learn. The main issue is that during a single training it learns to reach the goal from one position and when it is spawned to another location it was trainined on it takes the same path as before. The training for a single position is trained in like 100 to 150 episodes whereas when I spawn to two locations it wont converge ever after 10000 episodes.
I'll share the training curve for the agent being spawned at two locations.( Episodes greater than equal to 150 represent that it has reached the goal in the shortest path)
Observation:
Lidar Ranges (40 total ranges with max range of 5)
distance to goal
derivative of distance to goal
(I also tried adding more things to my observation such as pitch,yaw,roll and etc but that ultimately made the agent unable to converge faster)
Actions:
Pitch and Roll( -pi/7 to pi/7)
Yaw (-pi to pi)
Reward Function
Pitch, Roll and Yaw have a very small penalty
Penalty for hitting Obstacle (Large)
Reaching Goal
if derivative of disatance to goal is less than 0 its absoulte value is given as reward with a gain( rewarding the agent to maximize it velocity in the direction of the Goal)
minor step penalty
Termination Criteria:
If hit obstacle
If reached goal
Create the agent options object
agentOptions = rlDDPGAgentOptions();
% specify the options
agentOptions.SampleTime = Ts; %sampling time of 0.025 and final time of 25
agentOptions.MiniBatchSize = 256;
agentOptions.ExperienceBufferLength = 1e6;
agentOptions.MaxMiniBatchPerEpoch = 200;
% optimizer options
agentOptions.ActorOptimizerOptions.LearnRate = 1e-3;
agentOptions.ActorOptimizerOptions.GradientThreshold = 1;
agentOptions.CriticOptimizerOptions.LearnRate = 1e-3;
agentOptions.CriticOptimizerOptions.GradientThreshold = 1;
% exploration options
agentOptions.NoiseOptions.StandardDeviation = 0.15;
agentOptions.NoiseOptions.MeanAttractionConstant = 0.15;
initOpts = rlAgentInitializationOptions(NumHiddenUnit=256);
UAVQuadrotor = rlDDPGAgent(obsInfo,actInfo,initOpts,agentOptions);

Risposte (1)

Darshak
Darshak il 28 Apr 2025
Hello Arsalan,
This problem of not converging might be due to multiple reasons like overfitting, poor exploration strategy, single episode limits, reward function sensitivity, etc. It can be addressed using the following strategies:
  1. Randomize Start Positions More Aggressively
  2. Modify Reward Function: Add Directionality Reward: Reward the UAV if its heading (yaw) is aligned towards the goal. Exploration Bonus: Reward if UAV reaches unexplored states → promotes learning better coverage.
  3. Prioritized Experience Replay (PER): Instead of sampling experiences randomly, sample important ones more often
  4. Hybrid Approach (Supervised + RL): Phase 1: Pre-train with a few good trajectories using behavior cloning (supervised learning). Phase 2: Fine-tune using reinforcement learning to improve beyond the supervised examples.
The following documentation can be referred for the implementation of all of this: https://www.mathworks.com/help/releases/R2024b/reinforcement-learning/index.html

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by