Random forest prediction probabilities

Question

0 voti

Hi,

I trained a random forest model using MATLAB's "TreeBagger" function. However, when I use the "predict" function, my probabilities are all 0 or 1 except for a few predictions. Despite having 4000 observations, my roc curve has also only three data point. Can you suggest any solution for this problem?

Thanks in advance.

4 Commenti
Mostra 2 commenti meno recenti Nascondi 2 commenti meno recenti

Memo Remo il 13 Apr 2021

Modificato: Memo Remo il 13 Apr 2021

Apri in MATLAB Online

TRAIN_MathWork.mat

Thanks for the reply,

Attached is the data and and my code is copied below:

**************************************************

rng default
Y=TRAIN(:,7);
X_select=[1,2,3,6];
X=[TRAIN(:,X_select)];
CVO = cvpartition(Y,'k',5);
        
for i = 1:CVO.NumTestSets 
i
clear PredictedLabels PredictedProbabilities PredictedProbabilities_Cell Y_test X_test teIdx 
clear PredictedLabels4Tree TreeProb PredictedScores4Tree SelectedTree SelectedTreeID H idxvar 
clear Y_train X_train trIdx
trIdx = CVO.training(i);
X_train=X(trIdx,:);
Y_train=Y(trIdx,:);
b = TreeBagger(50,X_train,Y_train,'oobvarimp','on');
idxvar = find(b.OOBPermutedVarDeltaError>0.75)
b5v = TreeBagger(100,X_train(:,idxvar),Y_train,'oobpred','on','OOBPredictorImportance','on');
H=diff(oobError(b5v));
SelectedTreeID=find(abs(H)<0.001);
if(isempty(SelectedTreeID)==1)
    error('Increase the number of grown trees!')
end
SelectedTree=b5v.Trees{SelectedTreeID(1)};
[PredictedProbabilities4Tree PredictedScores4Tree]=predict(SelectedTree,X_train(:,idxvar));
TreeProb=cell2mat(PredictedProbabilities4Tree);
for r=1:size(PredictedProbabilities4Tree,1)
    
    PredictedLabels4Tree(r)=round(str2num(TreeProb(r)));
    
end
mdl_RF{i}=SelectedTree;
[fprRF,tprRF,~,AUC_RF] = perfcurve(Y_train,PredictedScores4Tree(:,2),'1');
teIdx = CVO.test(i);
X_test=X(teIdx,:);
Y_test=Y(teIdx,:);
PredictedProbabilities_Cell=(predict(mdl_RF{i},X_test));
for m=1:length(PredictedProbabilities_Cell)
    PredictedProbabilities(m,1)=str2num(PredictedProbabilities_Cell{m});
end
PredictedLabels=round(PredictedProbabilities);
         
[X_roc_RF{i},Y_roc_RF{i},T_roc_RF{i},AUCs_RF(j)] = perfcurve(Y_test,PredictedProbabilities,'1');
end
figure
plot(X_roc_RF{4},Y_roc_RF{4})

Memo Remo il 18 Apr 2021

Modificato: Memo Remo il 18 Apr 2021

Any suggestion?

Accedi per commentare.

Accedi per rispondere a questa domanda.

Follow Question

Answer 1

Aditya Patil il 10 Mag 2021

0 voti

Getting a probability of 1 suggests that the model has overfitted, and the observation is being predicted as belonging to the specific class by all trees.

You can overcome this issue by reducing the size of the trees. Few of the options that might help are,

MinLeafSize: Set this to higher value
MaxNumSplits: Set this to a lower value

You can also try to use fitcensemble instead. See TreeBagger and fitcensemble for more details.

Alternately, you may want to use a different approach entirely, by using SVMs or other classifiers.

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Memo Remo il 22 Giu 2021

Thanks a lot Aditya! Sorry for the late reply.

Accedi per commentare.

Random forest prediction probabilities

4 Commenti
Mostra 2 commenti meno recenti Nascondi 2 commenti meno recenti

Risposta accettata

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Più risposte (0)

Categorie

Tag

Community Treasure Hunt

Random forest prediction probabilities

4 Commenti Mostra 2 commenti meno recenti Nascondi 2 commenti meno recenti

Risposta accettata

1 Commento Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti

Più risposte (0)

Categorie

Tag

Vedere anche

Community Treasure Hunt

4 Commenti
Mostra 2 commenti meno recenti Nascondi 2 commenti meno recenti

1 Commento
Mostra -1 commenti meno recenti Nascondi -1 commenti meno recenti