Choosing the best set of initial weights of a neural network to train all dataset
36 views (last 30 days)
Show older comments
Mirko Job on 8 Aug 2019
Commented: Jonathan De Sousa on 30 Jan 2022
I am developing a neural network for pattern recognition in Matlab.
1) I divide my dataset into 6 folds (5 folds CV + 1 fold Test, which represent unseen data);
2) I choose 10 different number of hidden neurons;
3) I choose 10 different sets of initial weights (random);
4) For each fold (as test) (k);
- For each number of hidden neurons (i);
- - For each set of initial weights (j);
- - - I perform 5 fold CV (4 training and 1 early stop), saving the average performance (R^2) on Training Validation and Test and the average number of epochs of training across all iterations of the crossvalidation ([i,j,k] element of the result matrixes);
5) Averaging across the 6 different choices of test folds (k) (10x10x6 -> 10x10) I obtain a general estimate of the different models accross the entire DataSET considered as unseen data;
6) I choose the optimal number of hidden neurons as the value that describes the model which performs better in average across 10 iteration of different sets of initial weights (j);
7) I choose the number of training epochs as the average of training epochs found across the ten iteration of initial weights (j) for all possible choice of test set (k);
Now i have the number of hidden neurons and the number of epochs to train the final model on all data.
My question is how should i choose the initial set of weights ? Should I choose again ten sets of initial weights and train 10 different networks with the previous defined parameters to find the best ? In this case (since i don't have validation and test), the resulted net will not be overfitted?
Sourav Bairagya on 12 Aug 2019
The simplest way to initialize weights and biases is to set those to small uniform random values which works well for neural networks with a single hidden layer. But, when number of hidden layers is more than one, then you can use a good initialization scheme like “Glorot (also known as Xavier) Initialization”.
As we don’t know anything about the dataset beforehand, hence one good way is to assign the weights from a Gaussian distribution which have zero mean and some finite variance. With each passing layer, it is expected that the variance should remain same. This will help to keep the signal from exploding to a high value or vanishing to zero. In other words, it basically keeps the variance same for input and output for a hidden layer in the network and prevent the network from being overfitted.
According to the “Glorot/Xavier Initialization process”, the weights are initialized as follows (as written in this pseudo-code format):
for each hidden layer weight:
variance=2.0/(number of input + number of output);
stddev = sqrt(variance);
weight = gaussian(mean=0.0, stddev);
You can try this approach in your model to initialize the weights prior to training. As weight initialization does not depend upon the dataset, hence, there is no need to choose again ten sets of initial weights and train those different networks with the previously defined parameters to find the best one.
You can also use “fullyConnectedLayer” from “Deep Learning Toolbox”. Then, there the default initializer is ‘glorot’ initializer. For more information regarding this you can follow this link:
Jonathan De Sousa on 30 Jan 2022
The Glorot initialisation scheme does not actually use a Gaussian distribution. The weights are sampled rather from a uniform distribution. Have a look at: https://uk.mathworks.com/help/deeplearning/ug/initialize-learnable-parameters-for-custom-training-loop.html#mw_1bd0f2c3-c7df-4841-89ce-a7574d2db8d9
More Answers (0)
Find more on Pattern Recognition and Classification in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!Start Hunting!