Restart a parpool worker

7 visualizzazioni (ultimi 30 giorni)
Raghavasimhan Thirunarayanan
Risposto: Edric Ellis il 16 Giu 2020
Hello,
When I run parfor, sometimes a worker terminates with some error and the simulation continues with the remaining workers. But is there a way to automatically restart the parpool worker without having to stop and relaunch the simulation? I am at my wits end as to how to achieve it.
Thanks

Risposte (1)

Edric Ellis
Edric Ellis il 16 Giu 2020
There's no simple way to do this when using parfor with parpool unfortunately. I can think of a couple of workarounds that might help, depending very much on how your problem is set up.
Firstly, you could try the "cluster parfor" approach where you don't launch a parpool at all, and instead let the cluster run the loop directly. This is described in the doc here: https://www.mathworks.com/help/parallel-computing/parforoptions.html (See the section "Run parfor on a Cluster Without a Parallel Pool"). This approach launches independent tasks on your cluster rather than a parallel pool. This will only get decent performance if the time taken to launch the workers for the independent tasks is not significant compared to the time taken to run the entire loop. If it works for you, this is highly likely to be the simplest approach.
Secondly, if you can restructure your code to use parfeval instead of parfor, you could check the NumWorkers property of the parallel pool while consuming results, and if it decreases, restart the pool. This would be a bunch more work because you'd need to keep track of the incomplete work, and you'd have to re-submit it.
A third approach might be to restructure your parfor loop to send its results back using a DataQueue . Also, by launching the parpool using the 'SpmdEnabled', true option, the pool will automatically shut down any time a worker crashes. Then, the idea would be that the client stores the partial results of your loop using the DataQueue. The parfor loop would terminate with an error when a worker crashes, but you'd have the partial results and therefore would be able to re-start a new pool, and run a parfor loop over the incomplete iterations.

Categorie

Scopri di più su Parallel Computing Fundamentals in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by