Model Selection using Hold-out validation

Question

Tan il 30 Apr 2024

0
Link

Link diretto a questa domanda

https://it.mathworks.com/matlabcentral/answers/2113456-model-selection-using-hold-out-validation

Risposto: Gojo il 12 Mag 2024

Project overview: Total 1200 images for three classes and need to separate into training, validation and test set with the ratio of 70:20:10. And need to perform model selection and evaluation.

Does the coding provide corrected? Below is the coding:

clc

clear all;

close all;

% Open file in MATLAB

outputFolder = fullfile('D:\recycle101');

rootFolder = fullfile(outputFolder, 'project');

categories = {'aluminiumcan', 'petbottle', 'drinkcartonbox'};

% Load dataset

imds = imageDatastore(fullfile(rootFolder, categories), 'LabelSource', 'foldernames');

% Determine the number of images

numImages = numel(imds.Files);

% Define ratios for splitting the dataset

trainRatio = 0.7;

valRatio = 0.2;

testRatio = 0.1;

% Calculate the number of images for each set

numTrain = round(trainRatio * numImages);

numVal = round(valRatio * numImages);

numTest = numImages - numTrain - numVal;

% Shuffle the dataset

imds = shuffle(imds);

% Split the dataset into training, validation, and test sets

imdsTrain = subset(imds, 1:numTrain);

imdsValidation = subset(imds, numTrain+1:numTrain+numVal);

imdsTest = subset(imds, numTrain+numVal+1:numTrain+numVal+numTest);

% Display dataset counts

disp('Training Set:');

tblTrain = countEachLabel(imdsTrain)

disp('Validation Set:');

tblValidation = countEachLabel(imdsValidation)

disp('Test Set:');

tblTest = countEachLabel(imdsTest)

% Randomly choose images for display

AluminiumCan = randi(numel(imdsTrain.Files));

PETBottles = randi(numel(imdsTrain.Files));

DrinkCartonBox = randi(numel(imdsTrain.Files));

% Plot randomly chosen images

figure

subplot(2,2,1);

imshow(readimage(imdsTrain, AluminiumCan));

title('Aluminium Can');

subplot(2,2,2);

imshow(readimage(imdsTrain, PETBottles));

title('PET Bottle');

subplot(2,2,3);

imshow(readimage(imdsTrain, DrinkCartonBox));

title('Drink Carton Box');

% Load pre-trained network

net = googlenet;

analyzeNetwork(net)

lys = net.Layers;

lys(end-3:end)

numClasses = numel(categories(imdsTrain.Labels));

lgraph = layerGraph(net);

% Replace the classification layers for the new task

newFCLayer = fullyConnectedLayer(3, 'Name', 'new_fc', 'WeightLearnRateFactor', 10, 'BiasLearnRateFactor', 10);

lgraphNew = replaceLayer(lgraph, 'loss3-classifier', newFCLayer);

newClassLayer = classificationLayer('Name', 'new_classoutput');

lgraphNew = replaceLayer(lgraphNew, 'output', newClassLayer);

analyzeNetwork(lgraphNew)

% Resize the images and train the network

imageSize = net.Layers(1).InputSize;

augmentedTrainingSet = augmentedImageDatastore(imageSize, imdsTrain, 'ColorPreprocessing', 'gray2rgb');

augmentedValidateSet = augmentedImageDatastore(imageSize, imdsValidation, 'ColorPreprocessing', 'gray2rgb');

options = trainingOptions('sgdm', ...

'MiniBatchSize', 4, ...

'MaxEpochs', 8, ...

'InitialLearnRate', 1e-4, ...

'Shuffle', 'every-epoch', ...

'ValidationData', augmentedValidateSet, ...

'ValidationFrequency', 3, ...

'Verbose', false, ...

'ExecutionEnvironment', 'cpu', ...

'Plots', 'training-progress');

trainedNet = trainNetwork(augmentedTrainingSet, lgraphNew, options);

% Classify test set

YPred = classify(trainedNet, imdsTest);

YTest = imdsTest.Labels;

% Calculate performance metrics

accuracy = sum(YPred == YTest) / numel(YTest);

confMat = confusionmat(YTest, YPred);

precision = diag(confMat)' ./ sum(confMat, 1);

recall = diag(confMat)' ./ sum(confMat, 2);

f1 = 2 * (precision .* recall) ./ (precision + recall);

overallPrecision = mean(precision);

overallRecall = mean(recall);

overallF1 = mean(f1);

% Display performance metrics

disp(['Accuracy: ', num2str(accuracy)]);

disp('Confusion Matrix:');

disp(confMat);

disp('Precision:');

disp(precision);

disp(['Overall Precision: ', num2str(overallPrecision)]);

disp('Recall:');

disp(recall);

disp(['Overall Recall: ', num2str(overallRecall)]);

disp(['F1 Score: ', num2str(overallF1)]);

% Plot confusion matrix

categories = {'aluminiumcan', 'petbottle', 'drinkcartonbox'};

label = categorical(categories);

cm = confusionchart(confMat, label);

cm.RowSummary = 'row-normalized';

cm.ColumnSummary = 'column-normalized';

% Save network

save(fullfile(outputFolder, 'simpleDL.mat'), 'trainedNet', 'lgraph');

% Testing Process

I = imread('drinkcartonbox2.JPEG');

ds = augmentedImageDatastore(imageSize, I, 'ColorPreprocessing', 'gray2rgb');

predictedLabel = classify(trainedNet, ds);

disp(['The loaded image belongs to ', char(predictedLabel), ' class']);

0 Commenti
Mostra -2 commenti meno recentiNascondi -2 commenti meno recenti

Accedi per commentare.

Accedi per rispondere a questa domanda.

Answer 1

Gojo il 12 Mag 2024

0
Link

Link diretto a questa risposta

https://it.mathworks.com/matlabcentral/answers/2113456-model-selection-using-hold-out-validation#answer_1456141

Apri in MATLAB Online

Hey Tan,

The provided code seems correct, however I have a few suggestions. I am assuming you do not have to do any data preprocessing.

Although Hold-out Validation has its advantages for e.g. it is simple and does the job quickly, for a small dataset you should consider a K-Fold Cross-Validation to get more robust evaluation and reduce bias in the model.

Ensure that the dataset is balanced across different classes after splitting. The randomness in "shuffle" might not guarantee an even distribution of classes in each subset. Although you are displaying the count of labels in each dataset, I would suggest you to use some sampling techniques that handle the evenly distribution of data. You could use the "cvpartition" function for creating the partitions using the "stratify" option.

c = cvpartition(group,"Holdout",p,"Stratify",stratifyOption)

You can refer to the following documentation here: https://www.mathworks.com/help/stats/cvpartition.html

The model preparation and evaluation is done correctly. You could try data augmentation techniques for further improving your model. Please refer to the following:

https://www.mathworks.com/help/deeplearning/ug/image-augmentation-using-image-processing-toolbox.html

https://www.mathworks.com/help/deeplearning/ref/imagedataaugmenter.html

You may also enhance the model training my using multiple parallel workers with GPUs. You may find this documentation useful: https://www.mathworks.com/help/deeplearning/ug/deep-learning-with-matlab-on-multiple-gpus.html

I hope this helps!