Problem with parfor loop

Hello,
I need help, I have this problem when I use parfor loop for this code, where x a square matrix of order N = 75692 of size approximately equal to 13 Gb and t a row vector of
N=length(t);
alpha= [2.4e-7, 2.5e-7, 2.5e-7, 2.3e-6];
beta = [0.5, 0.8, 0.6, 0.5 ];
YT=zeros(N,4);
YD=zeros(N,4);
parfor ima1:4
[YT(:,ima),YD(:,ima)]=f(t,x,alpha(ima),beta(ima));
end
Error using distcomp.remoteparfor/getCompleteIntervals (line 133)

2 Commenti

Jan
Jan il 3 Mar 2021
This is a part of the error message. Please copy the complete message.
B.E.
B.E. il 3 Mar 2021
The complete message is
Error using distcomp.remoteparfor/getCompleteIntervals (line 133)
The parallel pool that parfor was using has shut down. To start a new parallel pool, run your parfor
code again or use parpool.
Error in Code_Method_of_Caracteristics (line 83)
parfor ima=2:4
A write error occurred while sending to worker 19.

Accedi per commentare.

Risposte (1)

Edric Ellis
Edric Ellis il 4 Mar 2021
That error basically means a worker crashed while trying to run the parfor loop. You mention that x is large. If you are using the 'local' cluster, then please be aware that x must be copied to each of the worker processes. This can cause a large amount of memory usage, and it's possible (probable?) that this is causing the workers to shut down.
If you're using a recent version of MATLAB, you might be able to use parpool('threads'), which uses multiple computational threads in a single process, and can avoid some memory duplication. (But not all MATLAB functions can operate in this environment).
Otherwise, you are going to be constrained by the memory on your system. Transferring x to the workers from the client incurs additional duplication while the messages are in transit, so if you're only just exceeding the memory, and you can build x directly on the workers by executing a function, then the following pattern might help:
xC = parallel.pool.Constant(@myFunctionThatBuildsX); % build 'x' directly on the worker
parfor ...
[YT(:,ima),YD(:,ima)]=f(t,xC.Value,alpha(ima),beta(ima));
end

5 Commenti

Thank you for you help.
I have the R2016b version of Matlab. there is 512 GB of memory (RAM) installed in the computer. The matrix x is built by the method of Runge Kutta 4 (ode45), so we cannot construct it on each worker.
I added parpool (4) at the start of the code, but the error always the same.
Edric Ellis
Edric Ellis il 5 Mar 2021
Perhaps try parpool(1)? That might rule out other problems. If the problems persist even there, then we're going to need some reproduction steps to try and narrow things down.
B.E.
B.E. il 5 Mar 2021
Modificato: B.E. il 5 Mar 2021
without parfor that is to say i used the loop for the code works fine.
I used parfor to minimize the computation time because each step takes about 16 hours to give the result.
Edric Ellis
Edric Ellis il 5 Mar 2021
Right, but I was wondering if with parpool(1) the single worker still crashed.
B.E.
B.E. il 5 Mar 2021
Modificato: B.E. il 5 Mar 2021
The same error with the single worker
Error using distcomp.remoteparfor/getCompleteIntervals (line 133)
The parallel pool that parfor was using has shut down. To start a new parallel pool, run your parfor
code again or use parpool.
Error in Code_Method_of_Caracteristics (line 83)
parfor ima=1:4
A write error occurred while sending to worker 1.

Accedi per commentare.

Categorie

Tag

Richiesto:

il 3 Mar 2021

Modificato:

il 5 Mar 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by