Parallrl calculations for Deep learning Toolbox

2 visualizzazioni (ultimi 30 giorni)
Oleg Kyrmyzy
Oleg Kyrmyzy il 18 Apr 2020
Risposto: Joss Knight il 19 Apr 2020
Greetings!
I have problem with parallel calculations for YOLO detector based on Resnet-50 network.
For the learning task, I use a virtual machine with 32 cores without connected GPU. In the settings of Parallel preferences, I picked 8 as the number of workers.
After running the code, I get the following error:
Error using nnet.internal.cnn.DistributedDispatcher (line 79)
'nnet.internal.cnn.GeneralDatastoreDispatcher' does not support order-preserving distribution.
Error in nnet.internal.cnn.DataDispatcherFactory>iCreateDistributedDispatcherIfRequired (line 204)
dispatcher = nnet.internal.cnn.DistributedDispatcher( dispatcher, executionSettings.workerLoad, retainDataOrder );
Error in nnet.internal.cnn.DataDispatcherFactory.createDataDispatcherMIMO (line 176)
dispatcher = iCreateDistributedDispatcherIfRequired(...
Error in vision.internal.cnn.trainNetwork>iCreateTrainingDataDispatcher (line 180)
dispatcher = nnet.internal.cnn.DataDispatcherFactory.createDataDispatcherMIMO( ...
Error in vision.internal.cnn.trainNetwork (line 34)
trainingDispatcher = iCreateTrainingDataDispatcher(ds, mapping, trainedNet,...
Error in trainYOLOv2ObjectDetector>iTrainYOLOv2 (line 391)
[yolov2Net, info] = vision.internal.cnn.trainNetwork(...
Error in trainYOLOv2ObjectDetector (line 187)
[net, info] = iTrainYOLOv2(ds, lgraph, params, mapping, options, checkpointSaver);
Error in YOLO_Multi_ver (line 83)
[detector,info] = trainYOLOv2ObjectDetector(preprocessedTrainingData,lgraph,options);
My traning options is
trainingOptions('sgdm','MiniBatchSize', 16, 'InitialLearnRate',1e-3, 'MaxEpochs',20, 'CheckpointPath', tempdir, 'Shuffle','never','ExecutionEnvironment', 'parallel');

Risposte (1)

Joss Knight
Joss Knight il 19 Apr 2020
Sorry about this not-very-good error, which should be fixed in the current release. What it means is that 'Shuffle', 'never' is not supported for your input data when training in parallel, because when the data is distributed to your GPUs there is no way to ensure that it is divided in such a way that the exact same sequence of observations is read. To fix it, change to 'Shuffle', 'once'.

Categorie

Scopri di più su Image Data Workflows in Help Center e File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by