Is there an easy way to find out which workers are running on the same host in a Generic Cluster job so I can efficiently allgather?
1 view (last 30 days)
Frank Moore-Clingenpeel on 30 Aug 2022
Say I have the following script which submits a job to a Generic parallel cluster, which has procsPerNode=2:
What this will do, is reques 2 nodes from my cluster, each of which will individually run 2 MATLAB workers in paralel, which alltogether will run mySpmdFunction as though it was launched within an spmd statement (so they can do stuff like labSend to communicate and use labindex to get an id, etc).
My question is, is there any way for the nodes to know which other workers are 'local'--i.e., which ones reside on the same piece of hardware versus which ones are remote? A way to use reflection to find this information is preferred, but if that's not available will MATLAB consistently assign workers to nodes sequentially (so then workers 1 and 2 will always share a node and workers 3 and 4 will always share a node in the example)? If there's no way to inquire what workers share nodes, is there a way to inquire and find the GenericCluster the workers are running on so I can find the procsPerNode property?
For that matter, is there a built-in allgather function? I'm really only investigating this to implement my own allgather from scratch...