Default randomization with parfor?
17 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Chris Cox
il 29 Dic 2022
Commentato: Chris Cox
il 29 Dic 2022
I have been searching the documentation and the community posts for a description of how randomization interacts with parallel computing with the parfor loop.
I am aware of this post: https://www.mathworks.com/matlabcentral/answers/53183-random-number-generation-for-parallel-computing-toolbox
The age of the post prevents it from resolving my concerns. A lot can change over the years, and links to documentation in that thread (and others in the community on random number generation and parallen computing) were broken.
I have read the following reasonably carefully:
- Controlling Random Number Generation
- Generate Random Numbers That Are Repeatable
- Generate Random Numbers That Are Different
and skimmed the parfor documentation for mentions of random number generation, and didn't feel this question was addressed explicitly (although it does use include functions that generate random numbers in the examples, which perhaps should have implicitly soothed my concerns... but I'm still wringing my hands).
While writing this question, I also found this old blog post. It seems to demonstrate that the default behavior is correct and you do not have to intervene, but something about how the post is written leaves me feeling like it was meant to demonstrate a problem and resolve it... I am worried I am missing something:
If there is a bit of current documentation that can set me straight, I would appreciate help finding it. Otherwise, I believe this to be a small but important gap in the current docs. I appreciate that the Parallel Computing Toolbox may be designed with the intent to shelter users from this complexity and to do the right thing whether the user knows what they are doing or not (a good design!), but I still think a statement about this behavior should be easily discoverable in the documentation for parfor or in the "Controlling Randomization" article, or in some special article on randomization associated with the Parallen Computing Toolbox. Thank you!
FWIW, my specific application involves a bootstrap resampling procedure. I expect each iteration of the parfor loop to generate an independent random sample, and this appears to be the case based on a naive inspection the output: subsequent rows in my output table are different. But since the iteration order is not guaranteed in parfor, there may be dependencies are are scattered randomly throughout my table that I do not know how to detect...
Edit: Here is an example of my code. I perfer my code to have a functional flavor to it. The fun parameter contains a function that will be executed for each row in the table. Sometimes, fun will contain a call to randi or randperm and they need to be independent random samples for each row. For the purposes of my question, I think you can substitute randi for fun.
The core of my question is: will the randomization within each function call in the parfor loop be independent?
function x = par_rowfun(fun, tbl, varargin)
p = inputParser();
addRequired(p, "fun", @(x) isa(x, "function_handle"));
addRequired(p, "tbl", @istable);
addParameter(p, "InputVariables", [], @isstring);
addParameter(p, "ExtractCellContents", false, @islogical);
addParameter(p, "OutputFormat", "cell", @(x) isequal(x, "cell"));
addParameter(p, "OutputVariableNames", [], @isstring);
parse(p, fun, tbl, varargin{:});
vars = p.Results.InputVariables;
if isempty(vars)
vars = tbl.Properties.VariableNames;
end
extract = p.Results.ExtractCellContents;
outputfmt = p.Results.OutputFormat;
outvars = p.Results.OutputVariableNames;
x = cell(height(tbl), 1);
[~, ix_vars] = ismember(vars, tbl.Properties.VariableNames);
parfor i = 1:height(tbl)
args = table2cell(tbl(i, ix_vars), 'ExtractCellContents', extract);
try
x{i} = fun(args{:});
catch ME
disp(i);
rethrow(ME);
end
end
switch p.Results.OutputFormat
case "cell"
out = x;
case "table"
out = [tbl(:, vars), table(x(:), 'VariableNames', outvars)];
end
end
1 Commento
Cameron
il 29 Dic 2022
Hi Chris, can you give some specifics about what you are looking for? Maybe post a sample of your code and what your goals would be. Thanks.
Risposta accettata
Walter Roberson
il 29 Dic 2022
The latter shows the steps you need to take if you do want the same random stream on all workers, as normally they will all be different.
Più risposte (0)
Vedere anche
Categorie
Scopri di più su Creating and Concatenating Matrices in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!