Distributed arrays unevenly distributed

2 visualizzazioni (ultimi 30 giorni)
Maria il 13 Ott 2021
Commentato: Oli Tissot il 12 Nov 2021
I have a remote cluster with 8 nodes, and each node has 16 GB of memory.
I am running an example with a big 3D matrix of size around 10000x 4500 x 8. I tried now to launch a batch job. The matrix is created directly in the function as distributed array, as
H_sym = zeros(m,m,LENGTH_BETA,'distributed')+1j*zeros(m,m,LENGTH_BETA,'distributed');
However, if I look at each node status (in Linux, with htop), I see that all cores of all nodes are working, and all nodes have 4 GB of memory occupied that does not change, all except the 1st node. The 1st node shows an allocation of memory that changes between 8GB and 13 GB.
Why is only the first node that has a larger occupation of memory, that changes over time? Shouldn't the "distributed" distribute the matrix in the same way among all nodes?
  1 Commento
Oli Tissot
Oli Tissot il 12 Nov 2021
Hi Maria,
When distributed arrays are constructed, they are distributed as evenly as possible along the second dimension. In your case, it means 4500 is spread into 8 parts and some workers end up getting 10000x562x8 local parts whereas others are getting 10000x563x8 local parts. So not all workers are using the exact same amount of memory, but I believe that do not explain the discrepancy you're seeing. I suspect the computation you're doing afterwards on H_sym involves communication between workers, thus workers receiving messages use more memory. And that could explain what you are seeing. What computation are doing on H_sym after creating it?
Finally, the way you're building H_sym is correct but there is more efficient here:
H_sym = zeros(m,m,LENGTH_BETA,'like',distributed(1i));

Accedi per commentare.

Risposte (0)


Scopri di più su MATLAB Parallel Server in Help Center e File Exchange




Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by