parpool() stalls on Xeon Phi x200 with >50 workers
1 visualizzazione (ultimi 30 giorni)
Mostra commenti meno recenti
I am evaluating parpool() on my new Intel Xeon Phi "Knights Landing" 7210. I find that parpool('local',NumWorkers) successfully creates a pool for NumWorkers<51, but it stalls and fails for any number equal to or greater than 51.
My system: 64 physical cores | 265 logical cores | 6x16GB memory | OS = CentOS linux | Matlab version R2018a
Attempted solutions: (1) changed java heap size between 512MB and 8192MB; (2) set java ThreadStackSize via $MATLAB/bin/glnxa64/java.opts (tried -XX:ThreadStackSize=8192 and 16384); (3) distcomp.feature( 'LocalUseMpiexec', false );
Each worker created by parpool takes about 0.5GB (according to top), such that plenty of system memory is left. Java memory resources also seem not to be depleted.
Here is a test I ran:
%%parpool() test
distcomp.feature( 'LocalUseMpiexec', false )
JavaRuntimeSettings = java.lang.management.ManagementFactory.getRuntimeMXBean.getInputArguments
[~,freeSystemMemory]=system('vmstat -s -S M | grep "free memory"')
rJavaObj = java.lang.Runtime.getRuntime;
freeMemory = rJavaObj.freeMemory
totalMemory = rJavaObj.totalMemory
maxMemory = rJavaObj.maxMemory
for NumberOfWorkers = [50, 51]
tic
pool = parpool('local',NumberOfWorkers)
TimeElapsed = toc
[~,freeSystemMemory]=system('vmstat -s -S M | grep "free memory"')
rJavaObj = java.lang.Runtime.getRuntime;
freeMemory = rJavaObj.freeMemory
totalMemory = rJavaObj.totalMemory
maxMemory = rJavaObj.maxMemory
delete(pool)
end
And here is the output I get:
ans =
logical
0
JavaRuntimeSettings =
[-Xms64m, -XX:NewRatio=3, -Xmx2048m, -XX:MaxDirectMemorySize=2147400000, -XX:+AllowUserSignalHandlers, -Xrs, -XX:ThreadStackSize=16384, -Djava.library.path=/usr/local/MATLAB/R2018a/bin/glnxa64:/usr/local/MATLAB/R2018a/sys/jxbrowser/glnxa64/lib, vfprintf, -XX:ErrorFile=/home/mph/hs_error_pid38489.log, abort, -Duser.language=en, -Duser.country=US, -Dfile.encoding=UTF-8, -XX:ParallelGCThreads=6]
freeSystemMemory =
' 85393 M free memory
'
freeMemory =
313054528
totalMemory =
458752000
maxMemory =
1.9687e+09
Starting parallel pool (parpool) using the 'local' profile ...
connected to 50 workers.
pool =
Pool with properties:
Connected: true
NumWorkers: 50
Cluster: local
AttachedFiles: {}
AutoAddClientPath: true
IdleTimeout: 3 minutes (3 minutes remaining)
SpmdEnabled: true
TimeElapsed =
69.1710
freeSystemMemory =
' 65170 M free memory
'
freeMemory =
351541184
totalMemory =
448266240
maxMemory =
1.9687e+09
Parallel pool using the 'local' profile is shutting down.
Starting parallel pool (parpool) using the 'local' profile ...
connected to 51 workers.
At that point it stalls and I never get the prompt back. Using the top command in the linux terminal I can see plenty of idle Matlab workers.
When I terminate the process (Ctr+c) within Matlab I get the following:
Operation terminated by user during parallel.internal.queue.JavaBackedFuture/waitScalar (line 211)
In parallel.Future>@(o)waitScalar(o,predicate,waitGranularity,deadline)
In parallel.Future/wait (line 292)
ret = all(arrayfun(@(o) waitScalar(o, predicate, waitGranularity, deadline), ...
In parallel.Future/fetchOutputsImpl (line 574)
wait(F);
In parallel.Future/fetchOutputs (line 341)
varargout = fetchOutputsImpl(F(:), nargout, varargin{:});
In parallel.Pool>iPostLaunchSetup (line 674)
mapping = fetchOutputs(parfevalOnAll(pool, @iGetMachineToWorkerMappingAndUnfreezePaths, 1, ...
In parallel.Pool.hBuildPool (line 588)
iPostLaunchSetup(aPool, client.ParallelJob.AdditionalPaths);
In parallel.internal.pool.doParpool (line 18)
pool = parallel.Pool.hBuildPool(constructorArgs{:});
In parpool (line 98)
pool = parallel.internal.pool.doParpool(varargin{:});
In partictoc (line 12)
pool = parpool('local',NumberOfWorkers)
So, what are these workers waiting for and why? How to make them do work?
0 Commenti
Risposte (1)
Sangeetha Jayaprakash
il 21 Mag 2018
Hi,
If you are referring to Xeon Phi host processors (as introduced with the Knights Landing architecture), they are compatible with the Parallel Computing Toolbox, as any other x86_64 processor with multiple cores. If you would like to use Xeon Phi coprocessors, they are not currently supported.
Vedere anche
Categorie
Scopri di più su Parallel Computing Fundamentals in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!