How to set up a Matlab parallel cluster for thread-based environment

20 visualizzazioni (ultimi 30 giorni)
Hi,
I am starting to explore Matlab Parallel functionalities, and, I have to say, I am a bit confused about the process-based vs. thread-based environment.
First question: I have 2 clusters, namely, the local cluster and the "MatlabCluster" (remote cluster with 8 nodes, 32 workers). If I use
poop = parpool('MatlabCluster');
the default environment is the "process-based" environment. Correct? Can I use the remote cluster in a "thread-based" environment? If I do
pool = parpool('thread');
only the local cluster switches to 'thread'. Can I do the same with the remote cluster?
Second question: I am experimenting with distributed arrays. However, if I start the 'MatlabCluster' (remote cluster), I get few errors and the last error message is
No workers are available for FevalQueue execution
This happens for the line of code that uses distrubuted arrays. I read that FevalQueue is not supported in "thread-based environment". Does this error mean that, by default, the remote cluster is starting as "thread-based"? (which would contradict my first hypotesis?).

Risposta accettata

Raymond Norris
Raymond Norris il 16 Giu 2021
The thread-based pool only runs on the same machine as the MATLAB client, similar to a local process-based pool. However, unlike the local pool, the threaded pool has a fixed startup size, which is the value returned by maxNumCompThreads. If you wanted a different number of workers started with a threads pool, you have to set it first in maxNumCompThreads. For example:
% Let's assume you have 8 physical cores, but only want to start a threaded
% pool of 2 workers.
old_threads = maxNumCompThreads(2);
parpool("threads");
Starting parallel pool (parpool) ... Connected to the parallel pool (number of workers: 2).
ans =
ThreadPool with properties: NumWorkers: 2
Keep in mind that setting maxNumCompThreads, in addition to effecting the number of workers started, may have an effect on your other MATLAB code.
You'll need to post a bit more (code, errors) to decipher the FevalQueue error.
  2 Commenti
Maria
Maria il 16 Giu 2021
Modificato: Maria il 16 Giu 2021
Thank you for your answer. With respect to the FevalQueue error, I am running some more test, and I start thinking that there is some problem with the distributed memory of nodes. I have some issues with the cluster that we set up, and I am already in contact with the Mathworks support since a week or so. However, I am able to run the remote cluster to some extent. I tried some code with a couple of parfor and I could see that all 32 workers were working.
Now, I tried to run a very simple test:
A = magic(4);
B = distributed(A);
And I get the warning:
Warning: The SPMD infrastructure has been initializing for 94 seconds. This may indicate a problem in initialization.
You might need to restart the pool.
And then
Error using distributed (line 282)
One or more futures resulted in and error.
Caused by:
No workers are available for FevalQueue execution.
The cluster has 8 nodes, 32 workers, that run Debian 10.9 (Buster). The client machine is also linux-based. The job scheduler is mjs. The firewall is disabled on all nodes, we already run tests including disabling the firewall on the client machine, and we excluded it as a problem.
During validation, the parpool "hangs" and I have to manually terminate Matlab because it does not respond anymore. This happens only when we use more than 1 node in the cluster.
How do I check the memory set up among the nodes of the cluster?
Raymond Norris
Raymond Norris il 16 Giu 2021
To get the memory on the Linux nodes, run
free -mth
This will give you the free & used memory.

Accedi per commentare.

Più risposte (0)

Categorie

Scopri di più su Parallel Computing Fundamentals in Help Center e File Exchange

Prodotti


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by