resampling an unbalanced dataset
3 visualizzazioni (ultimi 30 giorni)
Mostra commenti meno recenti
Hi, I have a dataset which has 2 classes(churn='False.' and churn='True.'). It is unbalanced because 700 of the 5000 sample is churn='False.' Is there a way to balance that distribution? Thank you in advance.
0 Commenti
Risposta accettata
Image Analyst
il 3 Gen 2015
Throw out all but 700 items where churn = true??? Then you'd have 700 false ones and 700 true ones. If not, then tell us in more detail what "balance" means to you.
3 Commenti
Image Analyst
il 3 Gen 2015
Uh, sure, if that's what you want. If it's in a table, you can automate it somewhat, like
% Find out which rows are true.
trueRows = find(t.churn);
% Take only the first 700:
trueRows = trueRows(1:max([length(trueRows), 700]));
% Find out which rows are false - we want to keep all those.
falseRows = find(t.churn == false);
% Combine the false and true rows into one list of indexes.
rowsToExtract = sort([falseRows, trueRows]);
% Now extract only the first 700 true, but all the false.
t = t(rowsToExtract );
or something like that. You might have to debug it some.
Più risposte (0)
Vedere anche
Categorie
Scopri di più su Data Type Identification in Help Center e File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!