Classifier not working properly on test set

2 visualizzazioni (ultimi 30 giorni)
Warid Islam
Warid Islam il 14 Lug 2020
Risposto: Nipun Katyal il 21 Lug 2020
Hello,
I have trained a SVM classifier on a breast cancer feature set. I get a validation accuracy of 83% on the training set but the accuracy is very poor on the test set. The data set has 1999 observations and 9 features.The training set to test set ratio is 0.6:0.4. Any suggestions would be very much appreciated. Thank you.
X_train=table2array(new7(1:1200,1:9));
y_train=table2array(new7(1:1200,10));
X_test=table2array(new7(1201:1999,1:9));
y_test=table2array(new7(1201:1999,10));
Mdl = fitcsvm(...
X_train, ...
y_train, ...
'KernelFunction','rbf',...
'OptimizeHyperparameters','auto',...
'HyperparameterOptimizationOptions',...
struct('AcquisitionFunctionName',...
'expected-improvement-plus'));
% Perform cross-validation
partitionedModel = crossval(Mdl, 'KFold', 10);
% Compute validation predictions
[validationPredictions, validationScores] = kfoldPredict(partitionedModel);
% Compute validation accuracy
validation_error = kfoldLoss(partitionedModel, 'LossFun', 'ClassifError'); % validation error
validationAccuracy = 1 - validation_error;
%% test model
oofLabel_n = predict(Mdl,X_test);
oofLabel_n = double(oofLabel_n); % chuyen tu categorical sang dang double
test_accuracy_for_iter = sum((oofLabel_n == (y_test)))/length(y_test)*100;
%% save model
saveCompactModel(Mdl,'mySVM');

Risposte (1)

Nipun Katyal
Nipun Katyal il 21 Lug 2020
clc
clear all
rawData = xlsread('new7.xlsx');
[m,n] = size(rawData);
new7 = rawData(randperm(m),:);
X_train=new7(1:1200,1:9);
y_train=new7(1:1200,10);
X_test=new7(1201:1999,1:9);
y_test=new7(1201:1999,10);
Mdl = fitcsvm(...
X_train, ...
y_train, ...
'KernelFunction','rbf',...
'OptimizeHyperparameters','auto',...
'HyperparameterOptimizationOptions',...
struct('AcquisitionFunctionName',...
'expected-improvement-plus'));
% Perform cross-validation
partitionedModel = crossval(Mdl, 'KFold', 10);
% Compute validation predictions
[validationPredictions, validationScores] = kfoldPredict(partitionedModel);
% Compute validation accuracy
validation_error = kfoldLoss(partitionedModel, 'LossFun', 'ClassifError'); % validation error
validationAccuracy = 1 - validation_error;
%% test model
oofLabel_n = predict(Mdl,X_test);
oofLabel_n = double(oofLabel_n); % chuyen tu categorical sang dang double
test_accuracy_for_iter = sum((oofLabel_n == (y_test)))/length(y_test)*100
%% save model
saveCompactModel(Mdl,'mySVM');
On observing your data set you will find that the labels for class 1 and 2 are clubbed together which leaves a discrepancy in the validation set and test set in which the validation set contained a majority of class 1 and some class 2 features while you test set had the opposite which resulted in poor accuracy, so before splitting the data into validation and test you should jumble the rows, so as to provide an equal distribution of classes across the two sets.

Categorie

Scopri di più su Biotech and Pharmaceutical in Help Center e File Exchange

Prodotti


Release

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by