Parfor with GPUs crashes
4 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Hello, everobody!
I have a code, that uses GPUs. I would like to use this code in parallel for different settings, i.e. code(setting=1),code(setting=2),code(setting=3) etc. For that I am implementing a parfor loop on a Linux-based high performance cluster (HPC).:
parfor i=1:N
code(setting=i)
end
However, it often crashes, especially when number of workers N is larger (more than 4-5). Typically, the crash is followed by shutting down Matlab with "Bus error" or "Fatal error" in the terminal.
What I do in general is the following. Firstly, I request the necessary resources: N workers with sufficient memory and a gpu per worker. Then I check that I do have a GPU per worker by :
spmd
gpuDeviceCount
end
After that, I initialzie the parpool with:
c=parcluster;
c.NumWorkers=N;
parpool(N)
And then I run my code. Note that an individual job with one GPU (without parfor loop) works perfectly. Also, it almost always work for 2-3 workers in parallel.
3 Commenti
Raymond Norris
il 14 Mar 2023
This is requesting 5 chunks, with 4 cores and 1 GPU per chunk. But this doesn't ensure that the 5 chunks are on the same node. I also wonder why you're requesting 5 chunks? If you're running a local pool, you only need 1 chunk. Try the following:
qsub -I -X -l select=1:ncpus=4:ngpus=1:mem=20gb,software=matlab
Then in MATLAB run
pctconfig('preservejobs',true);
setenv('MDCE_DEBUG','true')
local = parcluster("local");
pool = local.parpool(4);
% Run your parallel code
If/when the pool crashes,
local.getDebugLog(local.Jobs(end))
Risposte (0)
Vedere anche
Categorie
Scopri di più su Parallel Computing Fundamentals in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!