Calculating loss when cvpartition has been used within Hyperparam​eterOptimi​zationOpti​ons in fitcnb

2 visualizzazioni (ultimi 30 giorni)
% Basic set up with cross validation
Mdl=fitcnb(featuresTrain,targetTrain)
CVMdl=crossval(Mdl)
l=kfoldLoss(Mdl)
% Now, I set up stratification for cross validation
c = cvpartition(targetTrain,"KFold",10)
% Default Naive Bayes, with stratification
Mdl1_0=fitcnb(featuresTrain,targetTrain,...
'CVPartition',c)
loss1_0=kfoldLoss(Mdl1_0) % all OK. I get an answer
% output: 0.4ish
% Now, I optimize
Mdl1_2=fitcnb(featuresTrain,targetTrain,...
'OptimizeHyperparameters','auto',...
'HyperparameterOptimizationOptions',struct(...
'CVPartition',c,...
'AcquisitionFunctionName','expected-improvement-plus')) % for reproducibility
loss1_2 = kfoldLoss(Mdl1_2) % Error: Incorrect number or types of inputs or outputs for function kfoldLoss.
loss1_2_ = loss(Mdl1_2,featuresTrain,targetTrain) % Works, but the answer is considerably smaller than I expected
% output: 0.3ish
% Now, I test
loss1 = loss(Mdl1_2,featuresTest,targetTest)
% output: back to 0.4ish
I am attempting to stratify my cross validation using cvpartition.
This is fine at first. I use kfoldLoss and get a reasonable answer.
However, then I try to optimize and use cvpartition within HyperparameterOptimizationOptions. Now, I am unable to use kfoldLoss() (error above). Is this because the output Mdl1_2 is just one model that has already been optimised with cross validation, whereas Mdl1_1 is a cross validated model with essentially 10 outputs?
Assuming this might be the case, I use loss() instead, but I get a value a lot lower than I expected, as demonstrated by the loss() on my test data going back up again.
Maybe I've done it all right and this is just that the training set has been stratified but the test set doesn't necessarily have the same distribution? My data set is quite small (670 items) and these results came from a 85:15 train:test split.
Thank you.

Risposte (1)

Avadhoot
Avadhoot il 21 Dic 2023
Hi Richard,
I understand that you are encountering an issue when trying to calculate the K-fold cross-validated loss for your model. The approach you've taken is correct. However, the error you're experiencing arises because "Mdl1_2" is a "ClassificationNaiveBayes" object, while the "kfoldLoss" function requires a "ClassificationPartitionedModel" object. This can be achieved by manually applying cross-validation after the fitting step. But there is a more straightforward method:
After hyperparameter tuning, the cross-validated loss is automatically calculated and stored within the "HyperparameterOptimizationResults" property of the model. You can retrieve it using the following line of code:
cvLoss = Mdl1_2.HyperparameterOptimizationResults.MinObjective;
Once you have the cross-validated loss, you can proceed to calculate the loss on the test set using the "loss" function.
For additional details on the "kfoldLoss", "fitcnb", and "crossval" functions, refer to the following documentation links:
  1. "kfoldLoss" function: https://www.mathworks.com/help/stats/classreg.learning.partition.regressionpartitionedmodel.kfoldloss.html
  2. "fitcnb" function: https://www.mathworks.com/help/stats/fitcnb.html
  3. "crossval" function: https://www.mathworks.com/help/stats/crossval.html
I hope it helps.

Prodotti


Release

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by