[bug?] 2018a trainnetwork accuracy suddenly dropped with multi-gpu
1 visualizzazione (ultimi 30 giorni)
Mostra commenti meno recenti
When I use trainnet I experienced the accuracy dropped suddenly and it was not able to come back normal. The following is my trainingoption.
options = trainingOptions('sgdm','Momentum', 0.9,'InitialLearnRate', 1e-3,'L2Regularization', 0.0005,'MaxEpochs', 20000, 'MiniBatchSize',4,'Shuffle', 'every-epoch', 'CheckpointPath',newdirstoragetraicheckpoint, 'ExecutionEnvironment','multi-gpu','Plots','training-progress', 'VerboseFrequency', 2);
| 2320 | 39426 | 10:23:50 | 82.76% | 0.2362 | 0.0010 |
| 2320 | 39428 | 10:23:52 | 83.29% | 0.2832 | 0.0010 |
| 2320 | 39430 | 10:23:54 | 25.52% | 3.1097 | 0.0010 |
| 2320 | 39432 | 10:23:56 | 27.04% | 3.0014 | 0.0010 |
| 2320 | 39434 | 10:23:58 | 23.22% | 2.9561 | 0.0010 |
I've never had this issue before in 2017b so I suspect it's something to do with the new trainnetwork in 2018a. One thing I notice is that 2017b didn't introduce multi-gpu support for 'ExecutionEnvironment', could this be the reason? I'm running the same script again in 2017b at the moment with the 'ExecutionEnvironment' set to 'gpu' to see if it will occur.
2 Commenti
Joss Knight
il 14 Apr 2018
Nothing obvious changed in the multi-gpu training between R2017b and R2018a, although NCCL was upgraded. What happens when you take the most recent checkpoint before the loss jumped and input the layers from that network back into training, does the same thing happen?
This sort of behaviour isn't unheard of, because the loss landscape can be non-smooth near the solution and you can suddenly step to a bad solution with no means of escaping the local minimum. You may have been unlucky and this will never happen again. Try lowering the learn rate or use a learn rate drop schedule to ensure the learn rate is lower when you reach this unstable region.
Risposte (0)
Vedere anche
Categorie
Scopri di più su Image Data Workflows in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!