How the labindex was assigned for the workers inside a node/machine in MDCS?

1 visualizzazione (ultimi 30 giorni)
We know that in MDCS we can choose to create more than one workers inside a node/machine, say 4 workers per node/machine. So how the labindex was assigned for these 4 workers?Are thay always 1,2,3,4 for each node, or they are continuous increment node by node, such as 5-8, 9-12..., or they are totally random such as 1,3,9,6 for a node/,machine?

Risposta accettata

Edric Ellis
Edric Ellis il 25 Mag 2018
You don't specify which cluster type you're using with MDCS, but I'm going to assume MJS for now. (Not all of what follows will be scheduler-specific).
labindex within an spmd context is equal to the task index executing on the worker. So, if you have 2 nodes each running 4 workers, and you run a single communicating job of size 8 (i.e. parpool('myMjsCluster', 8)), then the task indices are 1:8, as are the corresponding values of labindex.
MJS will endeavour to schedule things such that consecutive tasks are co-located on a single node - i.e. it will attempt to put tasks 1:4 on the first node, and 5:8 on the second. (Most other scheduler types will end up doing something similar, but by a different means).
Basically, what you need to do is come up with a mapping of labindex to hostname to work out which labs are located on which host, and then you can use that "local labindex" to pick which Java program to use. Here's one way.
spmd
[s, hostname] = system('hostname');
assert(s == 0, 'Failed to compute hostname');
hostname = strtrim(hostname);
% Get a list of all hostnames in the pool
allHostnames = gcat({hostname}, 1);
% Work out which labindex values are on this host
allLabs = 1:numlabs;
labsOnThisHost = allLabs(strcmp(hostname, allHostnames))
% Work out this lab's position among the labs on this host
myIndexOnThisHost = find(labindex == labsOnThisHost)
end

Più risposte (1)

Walter Roberson
Walter Roberson il 25 Mag 2018
"The value of labindex spans from 1 to n, where n is the number of workers running the current job, defined by numlabs"
"This was done by pause a random seconds and then detect if there is ###.exe running in the tasklist of this node."
I would probably think in terms of having
if labindex == 1
check in case somehow external software is running
otherwise
launch external software
do any waiting for external software to be ready to go
end
end
labbarrier();
  1 Commento
raym
raym il 25 Mag 2018
Thanks Roberson. Your code is really a better way to share the external software, but I am not sure if every machine has labindex 1. In fact that's the key of this question.

Accedi per commentare.

Categorie

Scopri di più su Parallel and Cloud in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by