Azzera filtri
Azzera filtri

Inevitable broadcast variable in parfor?

4 visualizzazioni (ultimi 30 giorni)
I have a parfor loop where I want to get chunks of a matrix:
For example, below I want the loop to run on 3 parts [indexed by the final column]:
loop 1 is over rows 1 & 2, loop 2 is over rows 3 & 4, and loop 3 is over rows 5 through 8.
example data:
159 1 1
213 5 1
159 2 2
213 5 2
1100 2 3
1590 1 3
1770 1 3
2130 8 3
I made a loop index that gives the row ranges:
1 2
3 4
5 8
Then the code is:
parfor ii=1:3
var=example_data(loop_index(ii,1):loop_index(ii,2) ,1);
%do other stuff on var
end
This generates the broadcast warning and runs slow. I'm struggling to splice the example data variable in an easy way.
Note that in my real dataset I have thousands of loops so generating a separate matrix for each loop might get crazy.

Risposta accettata

Edric Ellis
Edric Ellis il 20 Nov 2017
If I've understood correctly, you can convert your non-sliced data into sliced data at the client before the loop starts. One way to do this is with splitapply.
data = [159 1 1
213 5 1
159 2 2
213 5 2
1100 2 3
1590 1 3
1770 1 3
2130 8 3];
% Split 'data' into a cell array.
varGroups = splitapply(@(x){x}, data(:,1), data(:,3));
parfor idx = 1:3
var = varGroups{idx};
% do stuff ...
end
  2 Commenti
CJ
CJ il 20 Nov 2017
This is what I was looking for. It gives no observable speed boost. Is that not surprising?
Walter Roberson
Walter Roberson il 20 Nov 2017
Communications overhead, I suspect. You would prefer to batch transfers together for performance, to reduce the number of operating system calls.

Accedi per commentare.

Più risposte (1)

Walter Roberson
Walter Roberson il 19 Nov 2017
In R2017a or later:
On client (the non-parfor level), construct a single data queue, say C. Then parfevalOnAll to send the data queue to all workers, and have the workers construct their own data queue (W#worker_number) and send the data queue back through queue C to the client along with the worker ID. Have the client record those worker data queues.
Now enter into the parfor loop, 1 to the number of chunks. Each worker should send its worker number and its parfor loop index through C to the client. The client should look up which sections of the matrix correspond to that chunk number, and should use the worker number to find W* queue specific to that worker and use that W* queue to send the data to the worker.
In this way, it is not necessary to transfer the entire array to all workers: each worker receives only the content of the chunk it needs to work on at the time.
... Though frankly I would expect that the overhead of doing all this communications would typically lead to lower performance.
If memory transfer is causing significant performance differences because of large arrays, it would probably be faster to do a pre-pass dividing the overall matrix up, one sub-matrix per worker, perhaps into a cell array, and then parfor 1 to number of workers, each worker getting one chunk, along with information about where the internal boundaries are in the chunk. The workers would then do all of the work for their chunk, perhaps using a local for loop to do so, and construct responses into a cell array. (Each element of which might be a cell array of all of a per-internal-chunk response.)

Categorie

Scopri di più su Parallel for-Loops (parfor) in Help Center e File Exchange

Tag

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by