How to shut down all running workers of paarpools?

26 visualizzazioni (ultimi 30 giorni)
Felix
Felix il 6 Mar 2023
Commentato: Davy Figaro il 16 Mag 2024
How can I find and shut down all workers of all parpools that might currently be running?
During debugging I frequently run into crashes and out of memory errors. Often, some worker processes keep running and I would like to know, how to best close all of them, before starting another script.

Risposte (3)

Raymond Norris
Raymond Norris il 6 Mar 2023
Hi @Felix. If even if a single worker crashes, all workers will terminate. Can you elaborate a bit more on a couple of things
  1. Are you using a local pool or a cluster? If cluster, MJS or your own scheduler (and if so, which)?
  2. Which parallel constructs are you using (parfor, parfeval, etc.)? Can you give a simple example of what might crash. Not interested in the details (I'm sure the worker(s) are crashing), more interested in how your running the code.
  1 Commento
Edric Ellis
Edric Ellis il 7 Mar 2023
Note that on "local" and MJS clusters, the parallel pool will not necessarily immediately terminate when a single worker crashes. On those clusters, pools that have not yet used spmd can survive losing workers.

Accedi per commentare.


Edric Ellis
Edric Ellis il 7 Mar 2023
You can shut down all remaining workers of the currently running pool by executing:
delete(gcp('nocreate'))
There should be no running workers other than in the current pool.
  1 Commento
Davy Figaro
Davy Figaro il 16 Mag 2024
This shuts down the current parallel pool (created with parpool). How can I stop and clear all the workers without shutting down the pool?

Accedi per commentare.


Felix
Felix il 8 Mar 2023
  1. I'm using local pools on my machine with default settings. On my machine this defaults to 12 workers.
  2. So far, I'm using parfor and the run command with MultiStart problems. I'll sometimes start a pool before running a script via parpool to reduce runtime of that script.
A simple, somewhat pseudocode example of my monte carlo stuff might be:
relevant_input = randn(1000, 1);
relevant_output = nan(height(relevant_input), 1);
param = 10;
parpool;
my_fun = @(input) elaborate_function(par, relevant_input);
parfor h=1:height(relevant_input)
relevant_ouput(h,1) = my_fun(input);
end
function y = elaborate_function(par, x)
y = param*x.*sin(x);
end
Another use case is the MultiStart object with
ms = MultiStart('UseParallel', true, 'Display','iter');
, which I use with run.
My scripts sometimes crash and I have trouble restarting them, because some workers do not seem to clear their memory when they crash. When I try to restart I get warnings such as:
Starting parallel pool (parpool) using the 'Processes' profile ...
Preserving jobs with IDs: 10 12 13 because they contain crash dump files.
You can use 'delete(myCluster.Jobs)' to remove all jobs created with profile Processes. To create 'myCluster' use 'myCluster = parcluster('Processes')'.
However, these crash dump files and the preserved jobs hog up way too much memory on my machine. I am looking for a couple lines of code to put at the start of my scripts that search running jobs, such as the ones containing crash dump files and terminate them if they exist, so I don't have to type delete(myCluster.Jobs) every time myself.
  1 Commento
Raymond Norris
Raymond Norris il 14 Mar 2023
I'm confused how the crash dump files and preserverd jobs how up too much memory. Do you mean disk space?
If a job is running, I'm not sure there would be a crash dump file (untill the end). And do you want to delete the crash file or the job? If you're running a parallel pool and the pool crashes, there's no job to delete.

Accedi per commentare.

Categorie

Scopri di più su Parallel Computing Fundamentals in Help Center e File Exchange

Prodotti


Release

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by