Parfor reports error which does not exist when running as a for-loop
4 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Hi,
To speed up some calculations I am using a parfor-loop. I have to run calculations on many files and I made a simple parfor-loop which runs a function on all these files. When analysis of one file is finished, the results are saved on disk. So, in principle, there is no communication between the different workers.
I have 12 workers (local) and for each worker the first run goes without problems. Then however I always get an error message like this (where this happens exactly can vary, but the type of message is always the same):
Error using parallel_function (line 598)
In an assignment A(:) = B, the number of elements in A and B
must be the same.
Error stack:
myfunc.m at 162
func>(parfor body) at 45
Error in func (line 14)
parfor ii=151:303
When I run the code in a for-loop, there is no error-message.
I have tried several things, but did not find a solution. The problem is that I can't debug this error, because it does not happen when I don't use parfor.
The only thing that works is to reduce the amount of workers. When I choose 6 workers, the error doesn't show up.
My temporary solution was to start 2 Matlab sessions, give them each a pool of 6 workers and divide the work manually between the 2 Matlab sessions.
This solution however does not work. In the 2nd Matlab session, the old error appears again after a short while. I really don't understand what the problem is...
10 Commenti
Matt J
il 25 Ago 2013
Modificato: Matt J
il 25 Ago 2013
Therefore I strongly believe that the error has something to do with how matlab deals with running parallel computations... It can't have anything to do with this C{ii}.
It's still conceivable that both of the above are true simultaneously, i.e., a difference between parallel and serial modes of computation is causing the C{ii} to be read in corrupted in some cases.
We have to start by examining the C{ii} because we have nowhere else to start, and because ample evidence you provided points to it. The error message you posted says there is a dimension mismatch error. Furthermore, you insisted that this error is occurring in the line
C{ii}(C{ii}>0)=C{ii}(C{ii}>0)+prevmax;
That has to mean that prevmax is for some reason either empty or non-scalar some of the time. We must seek ways to trap that condition.
Risposte (2)
Walter Roberson
il 25 Ago 2013
You would get that problem if C{ii-1} was empty, leading to prevmax being empty.
Remember, when you have a parfor loop, the iteration for the any particular value (e.g., #9) might be done at any time relative the iteration for the previous value (#8 in this example), so the assignment to C{8}(C{8}>0) might not have been performed before iteration #9 that calls upon C{8}. Indeed, parfor usually starts from the end. This differs from regular for.
3 Commenti
Walter Roberson
il 25 Ago 2013
Put in a try/catch that reports the size of prevmax when the problem is triggered
Matt J
il 25 Ago 2013
In parallel mode, you'll probably need to do
disp(prevmax)
to report prevmax.
Matt J
il 25 Ago 2013
You might also consider using PMODE to troubleshoot. This will allow you to step through different commands and see their results in the parallel command window.
2 Commenti
Vedere anche
Categorie
Scopri di più su Parallel Computing Fundamentals in Help Center e File Exchange
Prodotti
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!