race condition at asynchronous task assignment during runtime

6 views (last 30 days)
Dear Matlab community,
I am trying to figure out a problem that I anticipated easy but is proving me wrong ... Basically, I am trying to solve a race condition that arises using the parallel computing toolbox.
Imagine I have N tasks and G GPUs. Each task will take a different amount of time to be completed, and this time can be determined only during runtime. Thus, I don't want to preassign the N/G tasks to each GPU, as this will lead to an unbalanced workload distribution. Instead, I wish to launch an spmd of G labs (each one controlling a GPU) so that , whenever a lab finishes one assigned task, a new one is assigned to it in runtime, till all N tasks have been finished.
The problem with this concept is that each lab needs to know which tasks have already been assigned to other labs.
In one of the approaches I've tested, I create a file that stores the last task number. The lab reads this file, increases the task number by one and updates the file. Here the code (for N=3 tasks and G=2 )
numberTasks = 3;
lastAssignedTask = 0;
file = 'lastAssignedTask.txt';
while lastAssignedTask<numberTasks
lastAssignedTask = dlmread(file);
fprintf('Lab:%d read that the last assigned task was %d \n',labindex,lastAssignedTask);
taskForThisLab = lastAssignedTask+1;
lastAssignedTask = taskForThisLab;
fprintf('task: %d Lab:%d \n',lastAssignedTask,labindex);
However, the output is
Lab 1:
Lab:1 read that the last assigned task was 0
task: 1 Lab:1
Lab:1 read that the last assigned task was 1
task: 2 Lab:1
Lab:1 read that the last assigned task was 2
task: 3 Lab:1
Lab 2:
Lab:2 read that the last assigned task was 0
task: 1 Lab:2
Lab:2 read that the last assigned task was 1
task: 2 Lab:2
Lab:2 read that the last assigned task was 2
task: 3 Lab:2
Looks like each worker reads the file at the same time. I have been trying several workarounds (see the tags,) but none seems to be useful for this problem. Is there somehitng very obvious that I am overlooking?
Arabarra on 30 Dec 2020
Edited: Arabarra on 30 Dec 2020
I see... that's an elegant solution, thanks for pointing it to me. I attach my version of the code below (with all the fprints that I needed to debug it till it worked!), in case it might help others.
I hope future Matlab editions will offer some native tools to handle race conditions in a simpler manner.
numberTasks = 50;
numberLabs = 13; % one lab more than GPUs, last one is the controller
controller = numberLabs;
taskInLab = zeros(numberLabs-1,1);
% the k-th position in T being a zero means that the k-th lab is free
% just for reference
finishedJobsInLab = zeros(numberLabs-1,1);
lastAssignedTask = 0;
completedTasks = 0;
if labindex == controller
while completedTasks<numberTasks
freeLabs = find(taskInLab==0);
if length(freeLabs)>0
for i=1:length(freeLabs);
K = freeLabs(i);
lastAssignedTask = lastAssignedTask+1;
if lastAssignedTask>numberTasks
taskInLab(K) = lastAssignedTask;
taskToPerform = taskInLab(K);
targetLab = K;
fprintf(' <- [Coordinator] about to sed task:%d goes to lab:%d \n',....
fprintf(' >- [Coordinator] task:%d goes to lab:%d \n',....
% waits till ANY lab reports finishing
completedTasks = sum(finishedJobsInLab);
if completedTasks<numberTasks
fprintf(' [coordinator] Awaiting for some job to finish (%d launched in this round) %d completed in total \n',.....
passedCell = labReceive('any');
labThatFinished = passedCell{1};
fprintf(' [coordinator] finished task %d in labindex %d \n',....
finishedJobsInLab(labThatFinished) = finishedJobsInLab(labThatFinished)+1;
taskInLab(labThatFinished) = 0; % marks the lab as finished and free
% finished; all is good
fprintf(' [coordinator] %d completed in total \n',completedTasks);
for i=1:(numberLabs-1);
% no lab is free; everybody is busy
if labindex<controller
while lastAssignedTask<numberTasks
fprintf(' * [worker %d] viewed last assigned task: %d \n',labindex,lastAssignedTask);
lastAssignedTask = labReceive(controller);
thisTask = lastAssignedTask;
if thisTask<0
% this lab has finished
fprintf( '* [worker %d] finished \n',labindex);
fprintf(' (-[worker %d] task: %d \n',labindex,thisTask);
% reports finishing
passedInfo = {labindex,thisTask};
fprintf(' -)[worker %d] task: %d [finish signal sent] \n',labindex,thisTask);
fprintf('[worker %d] final task: %d \n',labindex,lastAssignedTask);
% do nothing, labindex is bigger than controller

Sign in to comment.

Accepted Answer

Walter Roberson
Walter Roberson on 30 Dec 2020
To prevent race conditions, instead of having each lab work independently, you can use an extra lab as a controller that manages the task assignments. The labs wait for work with a labReceive(), and notify the controller that they are done with a labSend() . The controller assigns work to any device that does not have work, and then does a labReceive() waiting for response. When there is no more work, the controller signals shutdown.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by