How to avoid NaN in the Mini-batch-loss from traning convolutional neural network?
Mostra commenti meno recenti
Hello,
I´m working on training a convolutional neural network following the example from https://de.mathworks.com/help/nnet/examples/create-simple-deep-learning-network-for-classification.html. I have 2000 images in each of 8 label categories and use 90% for training and 10% for testing. The images are in .jpg format and have a size 512x512x1. The arciture of the CNN is currently as follows:
layers = [imageInputLayer([512 512 1])
convolution2dLayer(5,15)
reluLayer
maxPooling2dLayer(2,'Stride',2)
fullyConnectedLayer(8)
softmaxLayer
classificationLayer()];
options = traningOptions('sgdm','MaxEpochs',15,'InitialLearnRate',0.001, 'ExecutionEnvironment', 'parallel' );
After training the first epoch the mini-batch loss is going to be NaN and the accuracy is around the chance level. The reason for this is probably that the back probagating generates NaN weights.

How can I avoid this problem? Thanks for the answers!
9 Commenti
Javier Pinzón
il 28 Apr 2017
I do not know if in any place of your code you inicializated the wheigths of the first convolutional layer... it is greatly recommended, as well as the fully connected layers. Also take into account the fitting of the input related to the size of the FC layer, remember that the size is input H*W*number of filter of the last Conv layer. Look this possibilities and tell me if it works
ismail alansary
il 24 Mag 2017
Modificato: ismail alansary
il 24 Mag 2017
hello, I just wonder if you have solved this problem?
I am new to CNN, so if initializing the weights of the first constitutional layer could help, Please tell me how to initialize it if i am using the same example
Javier Pinzón
il 26 Mag 2017
Hello Ismail, I recommend you to start always a new question if you need help. But, lets answer =).
To initialize the weights, you need to define the convolution layer before the Layer struct:
conv1 = convolution2dLayer(F,D,'Padding',0,...
'BiasLearnRateFactor',2,...
'name','conv1');
conv1.Weights = gpuArray(single(randn([F F 3 D])*0.0001));
conv1.Bias = gpuArray(single(randn([1 1 D])*0.00001+1));
You can initialize weights and the bias if needed. Remember, D is the amount of Filters to be used and F the size of the filter. Then, call your variable in the layer struct
layers = [ ...
imageInputLayer([128 128 3]);
conv1;
and that is all. Hope it helps
AlexanderTUE
il 9 Giu 2017
Johannes Stegmaier
il 24 Lug 2017
Modificato: Johannes Stegmaier
il 24 Lug 2017
Hi everyone,
I just stumbled over the same problem and for me the problem was that the input images were stored at 16 bit depth (32x32x1, png files). After converting all the training / test images to 8 bit it worked without any issues. Maybe that solves it for you, too?
Best,
Johannes
Javier Pinzón
il 25 Lug 2017
Modificato: Javier Pinzón
il 25 Lug 2017
Hello everybody,
Because i have been experienced some issues with PNG format images, I highlight recommend to use JPG/JPEG format, that is because sometimes, due to some layers that a PNG image has, it take the last layer and the image becomes the color of this layer, i.e., all the image is converted to a black or red... image. so, when you send these image to the network, it only will se one color image... nothing related to the rest of the images and the network will not be able to learn the features. Also be careful with the size of your filters. Also Johannes answer might be a solution in some cases.
Hope it helps,
Javier
_____________________
Edit:
Be careful with the size of your input image... When it is really big, as happened with Alexander, using only one convolution will be really difficult to the networ to learn, because will have only one structure of weights for a really big amount of features that the network want to learn. I would recomend use at least 2 or 3 convolution for that size, even a size of 128x128, and to use Pooling layers to reduce the size that will enter to the Fully-conneced layer, because it will help but to classify the features extracted.
AlexanderTUE
il 4 Set 2017
Javier Pinzón
il 6 Set 2017
Hello Alexander,
I'm happy that you where able to solve with your problem. Any Question be free to ask.
Greg Heath
il 7 Set 2017
Comment by Ashok kumar on 6 Jun 2017
MOVED FROM AN ACCEPTED ANSWER BOX
What is the mini batch loss in the table in command window and how it is calculated ??
Risposta accettata
Più risposte (4)
Khalid Babutain
il 18 Ott 2019
4 voti
I came across this issue because I had it, and I was able to solve it by only lowering the Initial Learning Rate from ('InitialLearnRate',1e-3) to ('InitialLearnRate',1e-5)
3 Commenti
Abderrahim Bakir
il 6 Feb 2022
It worked for me thank you sir
Matthew Luka
il 9 Feb 2022
Thanks alot. it worked for me too
Kelvin Owusu
il 8 Gen 2024
Yh this works l and helped more than expected. I was getting 29% accuracies changing paramaters, but reset them and reduced the the LearnRate I am getting 100% 👏
Matt J
il 5 Dic 2023
1 voto
I found that changing the solver (from "sgdm" to "adam") resolved the problem.
Salma Hassan
il 20 Dic 2017
0 voti
i have this line in my code [trainedNet,traininfo] = trainNetwork(trainingimages,Layers,opts); when i opened the structure traininfo i got the values of training accuracy and training loss but in the validation (accuracy , loss) i got only the first value and the rest is nan.. what is the problem in this case ??
2 Commenti
Javier Pinzón
il 21 Dic 2017
hello As,
Please can you open a new threat with this question, and provide screen shoots of your code and results as posible, to se what may be wrong or what is causing the problem?
Thanks
Salma Hassan
il 31 Dic 2017
Mr Javier Pinzón, i did a separate question with title "what is causes NaN values in the validation accuracy and loss from traning convolutional neural network and how to avoid it? " in this link https://www.mathworks.com/matlabcentral/answers/375090-what-is-causes-nan-values-in-the-validation-accuracy-and-loss-from-traning-convolutional-neural-n
Poorya Khanali
il 10 Feb 2021
0 voti
I have a ResNet when the image size is 35*60 everything works fine (no NaN during the training), but when I change the image size to 59*60 (for different data) the network at the beginning seems to work, but after some epochs the NaN starts to appear. Could you please help me out!

Categorie
Scopri di più su Parallel and Cloud in Centro assistenza e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
