Slow Training of RL Agent on HPC Compared to Local Machine

5 visualizzazioni (ultimi 30 giorni)
I am currently running a MATLAB 2021a script (execute.m added as attachment for reference) to train a reinforcement learning (RL) agent in Simulink to control a drone. While training it in my local machine it connects to 6 workers and the training speed is much higher compared to HPC which is connected to 12 workers. I have ensured that the whole node is assigned to the the job with 28 cores in total.
Here is the SLURM script:
#!/bin/bash -l
#SBATCH -J MATLAB_Execute # Job name
#SBATCH -N 1 # Number of nodes
#SBATCH -n 1 # Number of tasks (1 instance of the program)
#SBATCH -c 28 # Number of CPU cores per node
#SBATCH --gres=gpu:0 # Number of GPUs per node
#SBATCH --time=1:00:0 # Time limit (10 minutes)
#SBATCH -p batch -C skylake # Partition name (GPU partition)
export JAVA_LOG_DIR=/scratch/users/gshetty/java_logs
mkdir -p $JAVA_LOG_DIR
# Load the MATLAB module
module load math/MATLAB/2021a
module load openssl/1.1.1k
export LD_PRELOAD=/usr/lib64/libcrypto.so.1.1
# Run the MATLAB script
srun matlab -nodisplay -nosplash -r execute -logfile execute.out
what can be the potential reason?
  4 Commenti
Gaurav
Gaurav il 6 Giu 2024
Also need to mention that i use R2021a version as that is loaded in my HPC
Harald
Harald il 7 Giu 2024
Hi,
that's a big difference, indeed. If it takes hours on HPC, I am surprised that it finishes at all since you have specified a time limit.
If you get error messages, please copy the precise error message you get and the code that throws them. That makes it easier to investigate.
Assuming that we are speaking of run time and not any time that your job may be queued, waiting for resources to become available, I cannot imagine why it would take that long on HPC.
If there are no further ideas here, it may be an idea to reach out to Technical Support: https://www.mathworks.com/support/contact_us.html
Best wishes,
Harald

Accedi per commentare.

Risposte (0)

Categorie

Scopri di più su Containers in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by