Best read-only data strategy for parfor

1 visualizzazione (ultimi 30 giorni)
Robin
Robin il 18 Ott 2012
Hi,
I am using parfor on a grid with 60 workers.
I have some data which will be used read-only within the parfor loop.
I see that there are two options... load it on the machine I am submitting from so it is serialized and sent across the network (dedicated gigE for the cluster), or load it from disk within the loop.
Can anyone comment on which of these might be the best strategy for different data sizes? The data compresses very well so is about 20MB on disk but more than 1GB on in memory when loaded. What is the speed of loading and uncompressing in comparison to serialisation?
If I have it loaded on the submission machine, is matlab clever enough to serialize and send once to each worker or will it repeat it on every iteration. Obviously loading from a file would be done every iteration.
Any advice appreciated

Risposte (1)

Edric Ellis
Edric Ellis il 18 Ott 2012
I would recommend trying my Worker Object Wrapper. It's designed for just this sort of situation. In your case, you should put the files in a location available to the workers, and have them load the data using something like this:
w = WorkerObjectWrapper( @loadHugeData );
The object 'w' is then effectively a handle to the data. When you pass this into a PARFOR loop, the workers can then access the underlying data, like so:
parfor ii = 1:N
doSomethingWith( w.Value );
end

Categorie

Scopri di più su Parallel for-Loops (parfor) in Help Center e File Exchange

Prodotti

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by