getAllOutputArguments only returns one result per node, not per core
1 visualizzazione (ultimi 30 giorni)
Mostra commenti meno recenti
I'm running a parallel job on an SGE cluster, asking for 48 workers. I've set procsPerNode = 8 inside parallelSubmitFcn.m, so the job should use 6 nodes, and indeed it does. I can see that while it's running.
The problem is that the result obtained from getAllOutputArguments only contains results for 6 entries, as though there was only 1 worker per node.
My code simply returns 'labindex', and so the result should just be the integers 1..48. Below is the parallel job object, followed by the results, after the run. As you can see, it claims to have run all 48 tasks. However, the result only contains the first 6.
What's going on?
Thanks
-Don --------------------------------------
pjob = Parallel Job ID 144 Information ===============================
UserName : don
State : finished
SubmitTime : Tue May 08 15:32:20 EDT 2012
StartTime : Tue May 08 15:32:21 EDT 2012
Running Duration : 0 days 0h 0m 3s
- Data Dependencies
FileDependencies : /Users/don/math/MVPA/donsPause.m
PathDependencies : {}
- Associated Task(s)
Number Pending : 0
Number Running : 0
Number Finished : 48
TaskID of errors :
- Scheduler Dependent (Parallel Job)
MaximumNumberOfWorkers : 48
MinimumNumberOfWorkers : 48
>> getAllOutputArguments(pjob) ans = [1] [2] [3] [4] [5] [6] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] []
5 Commenti
Risposta accettata
Più risposte (1)
Thomas
il 14 Mag 2012
I just checked my parallelSubmitFcn file for our SGE cluster. We keep
procsPerNode = 1;
SGE has a different way of thinking about nodes... And I run even 128 processes and get the results back correctly.. WE have a mixture of hardware with some generation having 8 processors per node and some generations having 12 processors per node. procsPerNode = 1; let everything work right with SGE.. and SGE can use the processors, remaining after a couple of them have been taken on each node by other applications.. Your systems may vary, but this works for us and allows us backfill jobs... :)
4 Commenti
Thomas
il 14 Mag 2012
Don, not sure about how you define nodes/processors in your cluster.. We define a node as a physical node taking 1U rack space. Usually consists of 2 quad core or hex core processors thus getting 8-12 processors per node..
Vedere anche
Prodotti
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!