Why is the accuracy reported in the Classification Learner app much higher than the accuracy of the exported model on the held out validation set?
4 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Problem description
I used the default 5-fold cross-validation (CV) scheme in the Classification Learner app and trained all the available models. The best model (quadratic SVM) has 74.2% accuracy. I used
export model => generate code
and then ran the generated code, again examining the 5-fold CV accuracy. Surprisingly, the validation accuracy of this generated model was 66.8%, which is much lower than that reported by the app.
This has happened repeatedly - everytime I follow this scheme (with different datasets and models) I observe reduced accuracy for the exported model.
Potential explanations
I understand that not resetting the random number generation seed may lead to some variability between runs, but this effect seems too large and consistent to be explained by chance. I also understand that choosing the empirically-best model on the validation set, without having an additional test set, may lead to overestimation of its performance. If the exported model has a different CV partition, that could explain some of the reduction in accuracy, but I wonder if there might be some other explanation that I am missing.
0 Commenti
Risposte (1)
Vedere anche
Categorie
Scopri di più su Classification Learner App in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!