Understanding MatLab's built-in SVM cross-validation on fitcsvm

8 visualizzazioni (ultimi 30 giorni)
I have a dataset of 53 trials and I want to do leave-one-out cross-validation of a binary classifier. I tried to explicitly do the cross-validation of an SVM, with this code:
SVM_params = {'KernelFunction', 'linear', 'Standardize', true, ...
'BoxConstraint', 0.046125, 'ClassNames', class_names};
SVMModel = cell(53,1);
for i_trial = 1:53
%% Train
train_set_indices = [1:i_trial-1 i_trial+1:n_trials];
SVMModel{i_trial} = fitcsvm(input_data(train_set_indices, :), ...
true_labels(train_set_indices), SVM_params{:});
%% Predict
[estimated_labels(i_trial), score] = predict(SVMModel{i_trial}, ...
input_data(i_trial, :));
end
error_count = sum(~strcmp(true_labels, estimated_labels));
class_error = error_count / n_trials;
which gives me class_error equals to 0.4151.
However, if I tried MatLab's built-in SVM cross-validation
SVM_params = {'KernelFunction', 'linear', 'Standardize', true, ...
'Leaveout', 'on', 'BoxConstraint', 0.046125, 'ClassNames', class_names};
CSVM = fitcsvm(input_data, true_labels, SVM_params{:});
CSVM.kfoldLoss would be equal to 0.3208. Why the difference? What I am doing wrong in my explicit cross-validation?
I did the same exercise with 'Standarize', off and 'KernelScale', 987.8107 (optimized hyperparameters), and the difference is more dramatic: class_error=0.4528, while CSVM.kfoldLoss=0.
Finally, I would also like to know how what was the training and validation set for each of the trained models in CSVM.Trained. I would like to call predict on each trained model with the left-out sample (trial) and compare the result with CSVM.kfoldPredict.
Update 1: I found that c.traininig and c.test return the indices of the training and test sets. However, this code
SVM_params = {'KernelFunction', 'linear', 'Standardize', true, 'CVPartition', c,...
'BoxConstraint', BoxConstraint, 'ClassNames', class_names};
estimated_labels = cell(1,53);
CSVM = fitcsvm(input_data, true_labels, SVM_params{:});
for ii=1:53
estimated_labels(ii) = predict(CSVM.Trained{ii}, input_data(c.test(ii),:,1));
end
error_count = sum(~strcmp(true_labels, estimated_labels));
class_error = error_count / n_trials;
gives me class_error=0.5849, which is different to CSVM.kfoldLoss (0.3208). Why the difference? Is this the right way to double-check the cross-validation?
Update 2: I attached the data.
Thanks!
  2 Commenti
Image Analyst
Image Analyst il 31 Ago 2020
No answers probably because you forgot to attach your data.
Carlos Mendoza
Carlos Mendoza il 31 Ago 2020
I didn't forget. I thought that the code would be enough. Probably an error.

Accedi per commentare.

Risposte (1)

Xingwang Yong
Xingwang Yong il 29 Set 2020
Maybe kfoldLoss uses a different definition of loss than yours. Your definition is 1-accuracy.
https://www.mathworks.com/help/stats/classreg.learning.partition.regressionpartitionedkernel.kfoldloss.html?s_tid=srchtitle
  2 Commenti
Xingwang Yong
Xingwang Yong il 3 Ott 2020
class_error = error_count / n_trials;
= (n_trials - correct_count) / n_trials
= 1 - correct_count / n_trials
= 1 - accuracy
That is your definition of loss.

Accedi per commentare.

Prodotti


Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by