How to avoid broadcast variable while optimizing a cost function in parallel computing?

Question

0 voti

I'm trying to minimize a heavy cost function (2500X2500 is the biggest matrix in it) using PSO in parallel computing. It takes me a couple of days for only one (!) iteration and I'm not sure why. Will be very thankfull for any help.

I use parallel computing in order to fasten things, but for now I get the message "The entire array or structure 'CostFunction' is a broadcast variable. This might result in unnecessary communication overhead". This are the problematic lines:

parfor i=1:nPop

% Evaluation (position value in the cost function)

particle(i).Cost = CostFunction(particle(i).Position);

end

While CostFunction is a function handle I defined earlier in the code, and it's input changes each iteration.

Using MATLAB profiler I managged to get statistics of the running time of my code, pointing that most of running time is in that single parfor loop

While ICF is my original cost function, and diss+null are the children of it. As I understand from the flame graph ICF and it's children are not children of the parfor loop, hence the running time is divided between the loop and the cost function seperately. And the time consuming Java method I dont know, but I do know it's part of the parallel process.

So I'm basically asking two questions:

Is the broadcast variable problem the cause for the long running time?
how can I avoid broadcasting my cost function?

thanks in advance

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Accedi per seguire l’attività

Answer 1

Edric Ellis il 7 Dic 2022

0 voti

Investigating performance of parfor loops can be a bit tricky. Here are a few pointers:

Do you happen to know if your function already benefits from MATLAB's intrinsic multi-threading? (Check using your system's "Task Manager" or equivalent). If so, using only local workers with PCT will not speed things up as you are already using all your machine's resources. (Process workers run in single-threaded mode so each worker might well process things more slowly than your client - but if you've got several of them, you can still get speedup overall)
You can check the data transfer size using ticBytes and tocBytes. However, 2500x2500 is not particularly large, and I wouldn't expect it to cause things to take that long
You can use mpiprofile to profile the execution time on the workers - the client profile only shows that you're waiting for workers to complete their work.(This works fine with parfor, despite the name)

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Accedi per commentare.

How to avoid broadcast variable while optimizing a cost function in parallel computing?

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Risposte (1)

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Categorie

Prodotti

Release

Tag

Community Treasure Hunt

How to avoid broadcast variable while optimizing a cost function in parallel computing?

0 Commenti Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Risposte (1)

0 Commenti Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

Categorie

Prodotti

Release

Tag

Vedere anche

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti

0 Commenti
Mostra -2 commenti meno recenti Nascondi -2 commenti meno recenti