Choosing the best set of initial weights of a neural network to train all dataset

Question

Mirko Job il 8 Ago 2019

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/475370-choosing-the-best-set-of-initial-weights-of-a-neural-network-to-train-all-dataset

Commentato: Jonathan De Sousa il 30 Gen 2022

Risposta accettata: Sourav Bairagya

I am developing a neural network for pattern recognition in Matlab.

Currently:

1) I divide my dataset into 6 folds (5 folds CV + 1 fold Test, which represent unseen data);

2) I choose 10 different number of hidden neurons;

3) I choose 10 different sets of initial weights (random);

4) For each fold (as test) (k);

- For each number of hidden neurons (i);

- - For each set of initial weights (j);

- - - I perform 5 fold CV (4 training and 1 early stop), saving the average performance (R^2) on Training Validation and Test and the average number of epochs of training across all iterations of the crossvalidation ([i,j,k] element of the result matrixes);

5) Averaging across the 6 different choices of test folds (k) (10x10x6 -> 10x10) I obtain a general estimate of the different models accross the entire DataSET considered as unseen data;

6) I choose the optimal number of hidden neurons as the value that describes the model which performs better in average across 10 iteration of different sets of initial weights (j);

7) I choose the number of training epochs as the average of training epochs found across the ten iteration of initial weights (j) for all possible choice of test set (k);

Now i have the number of hidden neurons and the number of epochs to train the final model on all data.

My question is how should i choose the initial set of weights ? Should I choose again ten sets of initial weights and train 10 different networks with the previous defined parameters to find the best ? In this case (since i don't have validation and test), the resulted net will not be overfitted?

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Sourav Bairagya il 12 Ago 2019

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/475370-choosing-the-best-set-of-initial-weights-of-a-neural-network-to-train-all-dataset#answer_387226

The simplest way to initialize weights and biases is to set those to small uniform random values which works well for neural networks with a single hidden layer. But, when number of hidden layers is more than one, then you can use a good initialization scheme like “Glorot (also known as Xavier) Initialization”.

As we don’t know anything about the dataset beforehand, hence one good way is to assign the weights from a Gaussian distribution which have zero mean and some finite variance. With each passing layer, it is expected that the variance should remain same. This will help to keep the signal from exploding to a high value or vanishing to zero. In other words, it basically keeps the variance same for input and output for a hidden layer in the network and prevent the network from being overfitted.

According to the “Glorot/Xavier Initialization process”, the weights are initialized as follows (as written in this pseudo-code format):

for each hidden layer weight:

variance=2.0/(number of input + number of output);

stddev = sqrt(variance);

weight = gaussian(mean=0.0, stddev);

end for

You can try this approach in your model to initialize the weights prior to training. As weight initialization does not depend upon the dataset, hence, there is no need to choose again ten sets of initial weights and train those different networks with the previously defined parameters to find the best one.

You can also use “fullyConnectedLayer” from “Deep Learning Toolbox”. Then, there the default initializer is ‘glorot’ initializer. For more information regarding this you can follow this link:

https://www.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.fullyconnectedlayer.html

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Jonathan De Sousa il 30 Gen 2022

The Glorot initialisation scheme does not actually use a Gaussian distribution. The weights are sampled rather from a uniform distribution. Have a look at: https://uk.mathworks.com/help/deeplearning/ug/initialize-learnable-parameters-for-custom-training-loop.html#mw_1bd0f2c3-c7df-4841-89ce-a7574d2db8d9

Accedi per commentare.

Choosing the best set of initial weights of a neural network to train all dataset

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Più risposte (0)

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

Choosing the best set of initial weights of a neural network to train all dataset

0 Commenti Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Risposta accettata

1 Commento Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti

Più risposte (0)

Vedere anche

Categorie

Tag

Prodotti

Release

Community Treasure Hunt

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

1 Commento
Mostra -1 commenti meno recentiNascondi -1 commenti meno recenti