How to create pool of workers from a list of hostnames?

3 visualizzazioni (ultimi 30 giorni)
Hi all. In the past I've used the parallel computing toolbox on an HPC cluster to submit a job to the scheduler which Matlab then connects to and creates a worker pool. The problem is that on a shared HPC cluster this process must be started from a job too and it's not guaranteed that both jobs will start in temporal proximity.
What I would like to do is request all compute resources in advance, start up matlab on a single host, and then provide it with a list of the remaining hostnames where it can start up worker pools. Is this possible?
Any advice would be aprreciated. Thanks!

Risposta accettata

Raymond Norris
Raymond Norris il 18 Feb 2021
This isn't feasible at this time; however, you might consider wrapping your code with batch. For example, let's assume your code looks something like this
function my_parallel_code
% ...
nworkers = ... ;
% Check if a parallel pool is running. If so, use it. If not, initiate a parallel pool with 'nworkers'
p = gcp('nocreate');
if isempty(p)
% Pool has not been started, spawn pool across multiple nodes (kicks of new job)
% If the pool has already been started, it would have been done in "submit_job" with call to batch()
parpool('hpc',nworkers);
end
parfor idx = 1:N
% parallel code
end
Now, write a wrapper function
function submit_job
c = parcluster('hpc');
j = c.batch(@my_parallel_code,0,{},'Pool',nworkers);
Submit your MATLAB job (e.g. using Slurm). Call submit_job instead of my_parallel_code directly
#SBATCH -n 1
module load matlab
matlab -batch submit_job

Più risposte (1)

Jason Ross
Jason Ross il 18 Feb 2021
This is a general response to your question. For a more detailed answer, it depends on what scheduling software you are running on your HPC cluster, and if Parallel Server is installed on your cluster.
If MATLAB Parallel Server is installed on your cluster, you don't need to to anything to start up worker pools, the submission from the MATLAB client will do that for you. The way that you request specific resources is scheduler-dependent, but from within the cluster profile, you use the ResourceTemplate to select the resources you want -- which, dependent on how your scheduling software controls resources could be a queue, desired processor count, GPU, list of hostnames, etc.
When the job is submitted from the MATLAB client to the cluster, it passes the ResourceTemplate to the cluster and then the cluster assigns those resources to the job, runs the job, and passes the results back. All the scheduling, workers, etc are handled by the scheduler + install of Parallel Server on the cluster.
You might also find the batch command useful, as it will create a worker on the cluster on the job and then create the worker pool on the cluster, which then runs in the background and frees up your session.

Categorie

Scopri di più su Third-Party Cluster Configuration in Help Center e File Exchange

Prodotti


Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by