Leave-out-one cross validation during neural network training

Hello there I am trying to train a ML model with leave-out-one trial cross validation. Right now I have input data stored in the 1x10 cell array: "XTrain" with each cell containing the prediction inputs for all 10 trials and another 1x10 cell array: "YTrain" that contains the correposding continous variable we are trying to predict/output.
net = connectLayers(net,outputName,"fc");
% Specify training options
options = trainingOptions('adam', ...
'MaxEpochs', 60, ...
'MiniBatchSize', 1, ...
'SequenceLength', 'longest', ...
'InputDataFormats', 'CTB', ...
'Plots', 'training-progress', ...
'Metrics', 'rmse', ...
'Verbose', 0 ...
'Validation);
% Train the network
net = trainnet(XTrain,YTrain,net,"mse",options)
I have built my model's network architecture stored in "net", but I am unsure of how to incorporte "leave out one trial" validation during training and then test my model's performance. I want the model to pull out one trial at a time and then train the model and continue to do this for all 10 trials so that I end up with one final network trained and validated on all the trials. But I also want to have data to test the model's performance? Do I need to create a loop for this? or is there a way I can specify this in the training options? Any help would be greatly appreciated!

9 Commenti

Hi Isabelle,
To incorporate "leave-one-out" validation in your Matlab neural network training process, you need to modify your training loop to exclude one trial at a time for validation while training on the remaining data. This approach will make sure that your model is trained and validated on all trials individually, providing a robust evaluation of its performance. So, in order to achieve this, set the number of trials to 10 and iterate through each trial. Then, split the data into training and validation sets, excluding one trial for validation each time, define the training options for the neural network. Train the network on the training data while leaving out one trial for validation and test the model's performance on the excluded trial by making predictions and evaluating performance metrics. By folding these guidelines, you will be able to resolve your problem.
Please let me know if you have any further questions.
Hi Umar thank you for your response. This is very helpful. I am tryning to build my loop but I am stuck on how to pull out the other 9 trials aside from the one pulled out for validation and store them in a variable: "trainingdataX" shown here?
% Train the network
for i = length(XTrain)
validationdataX = XTrain(i)
validationdataY = YTrain(i)
traningdataX = ?? %this is where i am unsure how to pull out the all the other 9 trials not used for validation!!
trainingdataY = ??
options = trainingOptions("adam", ...
MaxEpochs=60, ...
miniBatchSize=1, ...
InputDataFormats="CTB", ...
Plots="training-progress", ...
Metrics="rmse", ...
Verbose=0,...
ValidationData={validationdataX,validationdataY} ...
);
net = trainnet(trainingdataX,trainingdataY,net,"mse",options)
end

Hi Isabelle,

No problem. To extract the training data, you can exclude the index used for validation from the total indices available. You can achieve this by creating a vector of indices excluding the current validation index by modifying your code snippet to achieve this

% Train the network

for i = 1:length(XTrain) % Iterate over all data points

    validationdataX = XTrain(i);
    validationdataY = YTrain(i);
    % Exclude the current index (i) for training
    trainingIndices = setdiff(1:length(XTrain), i);
    trainingdataX = XTrain(trainingIndices);
    trainingdataY = YTrain(trainingIndices);
    options = trainingOptions("adam", ...
        'MaxEpochs', 60, ...
        'MiniBatchSize', 1, ...
        'InputDataFormat', "CTB", ...
        'Plots', "training-progress", ...
        'Metrics', "rmse", ...
        'Verbose', 0, ...
        'ValidationData', {validationdataX, validationdataY} ...
    );
    net = trainnet(trainingdataX, trainingdataY, net, "mse", options);
end

So, in this modified code snippet, ‘setdiff` function is used to exclude the current index (i) from the list of indices, `trainingdataX` and `trainingdataY` are populated with the remaining 9 trials for training and finally the loop iterates over all data points, excluding one at a time for validation. So, by implementing this approach, you should be able to effectively train your neural network model using the specified data separation for training and validation.

Let me know if you need further clarification or assistance with this problem-solving task.

This is great, thank you so very much! Your help is greatly appreciated!
No problem, Isabelle. So, is this the answer you were seeking for
Hi Umar, I have one additional question. I am trying to test my model's performance and see how well it does with unseen data. Would I make predictions and calculate the RMSE values within the loop as shown here in order to do this properly?
for i = 1:length(XTrain)
validationdataX = XTrain(i);
validationdataY = YTrain(i);
% Exclude the current index (i) for training
trainingIndices = setdiff(1:length(XTrain), i);
trainingdataX = XTrain(trainingIndices);
trainingdataY = YTrain(trainingIndices);
options = trainingOptions("adam", ...
'MaxEpochs', 60, ...
'MiniBatchSize', 1, ...
'InputDataFormat', "CTB", ...
'Plots', "training-progress", ...
'Metrics', "rmse", ...
'Verbose', 0, ...
'ValidationData', {validationdataX, validationdataY} ...
);
net = trainnet(trainingdataX, trainingdataY, net, "mse", options);
%Test model and get rmse values
Predval = minibatchpredict(net,validationdataX, InputDataFormate="CTB");
TrueVal = validationdataY;
RMSE = rmse(Predval,TrueVal)
end
Hi Isabelle,
To make predictions and calculate the RMSE values within the loop sounds a valid approach but considering alternative methods such as built-in cross-validation functions such as crossval function will automate the process of splitting the data, training the model, and evaluating performance metrics such as RMSE, thus streamlining the workflow and reducing the likelihood of errors.
For more information on this function, please refer to
https://www.mathworks.com/help/stats/crossval.html
Also, utilizing k-fold cross-validation or stratified cross-validation can provide more robust estimates of model performance compared to a simple loop-based approach and techniques such as grid search or random search for hyperparameter optimization can be beneficial in identifying the best configuration for your neural network model.
For more information on
https://startupmarketingandcontentworld.quora.com/What-is-the-difference-between-k-fold-cross-validation-and-stratified-k-fold-cross-validation#:~:text=In%20summary%2C%20the%20difference%20between,into%20folds%20for%20model%20evaluation.
https://www.mathworks.com/matlabcentral/fileexchange/71546-simple-deep-learning-algorithms-with-k-fold-cross-validation?s_tid=srchtitle_support_results_1_k-fold%20cross-validation%20
In nutshell, using a loop for cross-validation is a valid approach but considering alternative methods such as built-in cross-validation functions and exploring hyperparameter optimization techniques can lead to more efficient and reliable model evaluation. Good luck!
Thank you so much! You are incredibly helpful!
No problem Isabelle, glad to help out.

Accedi per commentare.

Risposte (0)

Richiesto:

il 17 Lug 2024

Commentato:

il 19 Lug 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by